[Flang][mlir] - Translation of delayed privatization for deferred target-tasks #155348

bhandarkar-pranav · 2025-08-26T03:21:10Z

This PR adds support for translation of the private clause on deferred target tasks - that is omp.target operations with the nowait clause.

An offloading call for a deferred target-task is not blocking - the offloading host task continues it execution after issuing the offloading call. Therefore, the key problem we need to solve is to ensure that the data needed for private variables to be initialized in the target task persists even after the host task has completed.
We do this in a new pass called PrepareForOMPOffloadPrivatizationPass. For a privatized variable that needs its host counterpart for initialization (such as the shape of the data from the descriptor when an allocatable is privatized or the value of the data when an allocatable is firstprivatized),

the pass allocates memory on the heap.
it then initializes this memory by using the init and copy (for firstprivate) regions of the corresponding omp::PrivateClauseOp.
Finally the memory allocated on the heap is free using the dealloc region of the same omp::PrivateClauseOp instance. This step is not straightforward though, because we cannot simply free the memory that's going to be used by another thread without any synchronization. So, for deallocation, we create a omp.task after the omp.target and synchronize the two with a dummy dependency (using the depend clause). In this newly created omp.task we do the deallocation.

llvmbot · 2025-08-26T03:21:44Z

@llvm/pr-subscribers-mlir-llvm
@llvm/pr-subscribers-flang-fir-hlfir

@llvm/pr-subscribers-flang-driver

Author: Pranav Bhandarkar (bhandarkar-pranav)

Changes

This PR adds support for translation of the private clause on deferred target tasks - that is omp.target operations with the nowait clause.

An offloading call for a deferred target-task is not blocking - the offloading host task continues it execution after issuing the offloading call. Therefore, the key problem we need to solve is to ensure that the data needed for private variables to be initialized in the target task persists even after the host task has completed.
We do this in a new pass called PrepareForOMPOffloadPrivatizationPass. For a privatized variable that needs its host counterpart for initialization (such as the shape of the data from the descriptor when an allocatable is privatized or the value of the data when an allocatable is firstprivatized),

the pass allocates memory on the heap.
it then initializes this memory by copying the contents of host variable to the newly allocated location on the heap.
Then, the pass updates all the omp.map.info operations that pointed to the host variable to now point to the one located in the heap.

The pass uses a rewrite pattern applied using the greedy pattern matcher, which in turn does some constant folding and DCE. Due to this a number of lit tests had to be updated. In GEPs constant get folded into indices and truncated to i32 types. In some tests sequence of insertvalue and extractvalue instructions get cancelled out. So, these needed to be updated too.

Patch is 79.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/155348.diff

30 Files Affected:

(modified) flang/include/flang/Optimizer/Passes/Pipelines.h (+1)
(modified) flang/lib/Optimizer/Passes/Pipelines.cpp (+6)
(modified) flang/test/Driver/tco-emit-final-mlir.fir (+2-2)
(modified) flang/test/Driver/tco-test-gen.fir (+2-3)
(modified) flang/test/Fir/alloc-32.fir (+1-1)
(modified) flang/test/Fir/alloc.fir (+9-8)
(modified) flang/test/Fir/arrexp.fir (+2-2)
(modified) flang/test/Fir/basic-program.fir (+2)
(modified) flang/test/Fir/box.fir (+3-3)
(modified) flang/test/Fir/boxproc.fir (+4-12)
(modified) flang/test/Fir/embox.fir (+3-3)
(modified) flang/test/Fir/omp-reduction-embox-codegen.fir (+3-3)
(modified) flang/test/Fir/optional.fir (+1-2)
(modified) flang/test/Fir/pdt.fir (+3-3)
(modified) flang/test/Fir/rebox.fir (+9-9)
(modified) flang/test/Fir/select.fir (+1-1)
(modified) flang/test/Fir/target.fir (-4)
(modified) flang/test/Fir/tbaa-codegen2.fir (+3-9)
(modified) flang/test/Integration/OpenMP/map-types-and-sizes.f90 (+7-7)
(modified) flang/test/Lower/allocatable-polymorphic.f90 (-4)
(modified) flang/test/Lower/forall/character-1.f90 (+2-2)
(added) mlir/include/mlir/Dialect/LLVMIR/Transforms/OpenMPOffloadPrivatizationPrepare.h (+23)
(modified) mlir/include/mlir/Dialect/LLVMIR/Transforms/Passes.td (+12)
(modified) mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td (+2-2)
(modified) mlir/lib/Dialect/LLVMIR/Transforms/CMakeLists.txt (+2)
(added) mlir/lib/Dialect/LLVMIR/Transforms/OpenMPOffloadPrivatizationPrepare.cpp (+425)
(modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+2-9)
(modified) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp (+1)
(added) mlir/test/Dialect/LLVMIR/omp-offload-privatization-prepare.mlir (+167)
(modified) mlir/test/Target/LLVMIR/openmp-todo.mlir (-18)

diff --git a/flang/include/flang/Optimizer/Passes/Pipelines.h b/flang/include/flang/Optimizer/Passes/Pipelines.h
index a3f59ee8dd013..17d48f46e4b9b 100644
--- a/flang/include/flang/Optimizer/Passes/Pipelines.h
+++ b/flang/include/flang/Optimizer/Passes/Pipelines.h
@@ -22,6 +22,7 @@
 #include "mlir/Conversion/SCFToControlFlow/SCFToControlFlow.h"
 #include "mlir/Dialect/GPU/IR/GPUDialect.h"
 #include "mlir/Dialect/LLVMIR/LLVMAttrs.h"
+#include "mlir/Dialect/LLVMIR/Transforms/OpenMPOffloadPrivatizationPrepare.h"
 #include "mlir/Pass/PassManager.h"
 #include "mlir/Transforms/GreedyPatternRewriteDriver.h"
 #include "mlir/Transforms/Passes.h"
diff --git a/flang/lib/Optimizer/Passes/Pipelines.cpp b/flang/lib/Optimizer/Passes/Pipelines.cpp
index ca8e820608688..6a11461cd8380 100644
--- a/flang/lib/Optimizer/Passes/Pipelines.cpp
+++ b/flang/lib/Optimizer/Passes/Pipelines.cpp
@@ -403,6 +403,12 @@ void createMLIRToLLVMPassPipeline(mlir::PassManager &pm,
 
   // Add codegen pass pipeline.
   fir::createDefaultFIRCodeGenPassPipeline(pm, config, inputFilename);
+
+  // Run a pass to prepare for translation of delayed privatization in the
+  // context of deferred target tasks.
+  addNestedPassConditionally<mlir::LLVM::LLVMFuncOp>(pm, disableFirToLlvmIr,[&]() {
+    return mlir::LLVM::createPrepareForOMPOffloadPrivatizationPass();
+  });
 }
 
 } // namespace fir
diff --git a/flang/test/Driver/tco-emit-final-mlir.fir b/flang/test/Driver/tco-emit-final-mlir.fir
index 75f8f153127af..177810cf41378 100644
--- a/flang/test/Driver/tco-emit-final-mlir.fir
+++ b/flang/test/Driver/tco-emit-final-mlir.fir
@@ -13,7 +13,7 @@
 // CHECK: llvm.return
 // CHECK-NOT: func.func
 
-func.func @_QPfoo() {
+func.func @_QPfoo() -> !fir.ref<i32> {
   %1 = fir.alloca i32
-  return
+  return %1 : !fir.ref<i32>
 }
diff --git a/flang/test/Driver/tco-test-gen.fir b/flang/test/Driver/tco-test-gen.fir
index 38d4e50ecf3aa..15483f7ee3534 100644
--- a/flang/test/Driver/tco-test-gen.fir
+++ b/flang/test/Driver/tco-test-gen.fir
@@ -42,11 +42,10 @@ func.func @_QPtest(%arg0: !fir.ref<i32> {fir.bindc_name = "num"}, %arg1: !fir.re
 // CHECK-SAME:      %[[ARG2:.*]]: !llvm.ptr {fir.bindc_name = "ub", llvm.nocapture},
 // CHECK-SAME:      %[[ARG3:.*]]: !llvm.ptr {fir.bindc_name = "step", llvm.nocapture}) {
 
+// CMPLX:           %[[VAL_3:.*]] = llvm.mlir.constant(0 : index) : i64
+// CMPLX:           %[[VAL_2:.*]] = llvm.mlir.constant(1 : index) : i64
 // CMPLX:           %[[VAL_0:.*]] = llvm.mlir.constant(1 : i64) : i64
 // CMPLX:           %[[VAL_1:.*]] = llvm.alloca %[[VAL_0]] x i32 {bindc_name = "i"} : (i64) -> !llvm.ptr
-// CMPLX:           %[[VAL_2:.*]] = llvm.mlir.constant(1 : index) : i64
-// CMPLX:           %[[VAL_3:.*]] = llvm.mlir.constant(0 : index) : i64
-// CMPLX:           %[[VAL_4:.*]] = llvm.mlir.constant(1 : i64) : i64
 
 // SIMPLE:          %[[VAL_3:.*]] = llvm.mlir.constant(0 : index) : i64
 // SIMPLE:          %[[VAL_2:.*]] = llvm.mlir.constant(1 : index) : i64
diff --git a/flang/test/Fir/alloc-32.fir b/flang/test/Fir/alloc-32.fir
index a3cbf200c24fc..f57f6ce6fcf5e 100644
--- a/flang/test/Fir/alloc-32.fir
+++ b/flang/test/Fir/alloc-32.fir
@@ -19,7 +19,7 @@ func.func @allocmem_scalar_nonchar() -> !fir.heap<i32> {
 // CHECK-LABEL: define ptr @allocmem_scalar_dynchar(
 // CHECK-SAME: i32 %[[len:.*]])
 // CHECK: %[[mul1:.*]] = sext i32 %[[len]] to i64
-// CHECK: %[[mul2:.*]] = mul i64 1, %[[mul1]]
+// CHECK: %[[mul2:.*]] = mul i64 %[[mul1]], 1
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[mul2]], 0
 // CHECK: %[[sz:.*]] = select i1 %[[cmp]], i64 %[[mul2]], i64 1
 // CHECK: %[[trunc:.*]] = trunc i64 %[[sz]] to i32
diff --git a/flang/test/Fir/alloc.fir b/flang/test/Fir/alloc.fir
index 8da8b828c18b9..0d3ce323d0d7c 100644
--- a/flang/test/Fir/alloc.fir
+++ b/flang/test/Fir/alloc.fir
@@ -86,7 +86,7 @@ func.func @alloca_scalar_dynchar_kind(%l : i32) -> !fir.ref<!fir.char<2,?>> {
 // CHECK-LABEL: define ptr @allocmem_scalar_dynchar(
 // CHECK-SAME: i32 %[[len:.*]])
 // CHECK: %[[mul1:.*]] = sext i32 %[[len]] to i64
-// CHECK: %[[mul2:.*]] = mul i64 1, %[[mul1]]
+// CHECK: %[[mul2:.*]] = mul i64 %[[mul1]], 1
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[mul2]], 0
 // CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[mul2]], i64 1
 // CHECK: call ptr @malloc(i64 %[[size]])
@@ -98,7 +98,7 @@ func.func @allocmem_scalar_dynchar(%l : i32) -> !fir.heap<!fir.char<1,?>> {
 // CHECK-LABEL: define ptr @allocmem_scalar_dynchar_kind(
 // CHECK-SAME: i32 %[[len:.*]])
 // CHECK: %[[mul1:.*]] = sext i32 %[[len]] to i64
-// CHECK: %[[mul2:.*]] = mul i64 2, %[[mul1]]
+// CHECK: %[[mul2:.*]] = mul i64 %[[mul1]], 2
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[mul2]], 0
 // CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[mul2]], i64 1
 // CHECK: call ptr @malloc(i64 %[[size]])
@@ -185,7 +185,7 @@ func.func @alloca_dynarray_of_nonchar2(%e: index) -> !fir.ref<!fir.array<?x?xi32
 
 // CHECK-LABEL: define ptr @allocmem_dynarray_of_nonchar(
 // CHECK-SAME: i64 %[[extent:.*]])
-// CHECK: %[[prod1:.*]] = mul i64 12, %[[extent]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[extent]], 12
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod1]], 0
 // CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[prod1]], i64 1
 // CHECK: call ptr @malloc(i64 %[[size]])
@@ -196,7 +196,7 @@ func.func @allocmem_dynarray_of_nonchar(%e: index) -> !fir.heap<!fir.array<3x?xi
 
 // CHECK-LABEL: define ptr @allocmem_dynarray_of_nonchar2(
 // CHECK-SAME: i64 %[[extent:.*]])
-// CHECK: %[[prod1:.*]] = mul i64 4, %[[extent]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[extent]], 4
 // CHECK: %[[prod2:.*]] = mul i64 %[[prod1]], %[[extent]]
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod2]], 0
 // CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[prod2]], i64 1
@@ -227,7 +227,7 @@ func.func @alloca_dynarray_of_char2(%e : index) -> !fir.ref<!fir.array<?x?x!fir.
 
 // CHECK-LABEL: define ptr @allocmem_dynarray_of_char(
 // CHECK-SAME: i64 %[[extent:.*]])
-// CHECK: %[[prod1:.*]] = mul i64 60, %[[extent]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[extent]], 60
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod1]], 0
 // CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[prod1]], i64 1
 // CHECK: call ptr @malloc(i64 %[[size]])
@@ -238,7 +238,7 @@ func.func @allocmem_dynarray_of_char(%e : index) -> !fir.heap<!fir.array<3x?x!fi
 
 // CHECK-LABEL: define ptr @allocmem_dynarray_of_char2(
 // CHECK-SAME: i64 %[[extent:.*]])
-// CHECK: %[[prod1:.*]] = mul i64 20, %[[extent]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[extent]], 20
 // CHECK: %[[prod2:.*]] = mul i64 %[[prod1]], %[[extent]]
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod2]], 0
 // CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[mul2]], i64 1
@@ -286,7 +286,7 @@ func.func @allocmem_dynarray_of_dynchar(%l: i32, %e : index) -> !fir.heap<!fir.a
 // CHECK-LABEL: define ptr @allocmem_dynarray_of_dynchar2(
 // CHECK-SAME: i32 %[[len:.*]], i64 %[[extent:.*]])
 // CHECK: %[[a:.*]] = sext i32 %[[len]] to i64
-// CHECK: %[[prod1:.*]] = mul i64 2, %[[a]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[a]], 2
 // CHECK: %[[prod2:.*]] = mul i64 %[[prod1]], %[[extent]]
 // CHECK: %[[prod3:.*]] = mul i64 %[[prod2]], %[[extent]]
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod3]], 0
@@ -366,12 +366,13 @@ func.func @allocmem_array_with_holes_dynchar(%arg0: index, %arg1: index) -> !fir
 // CHECK:    %[[VAL_0:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] }, i64 1
 // CHECK:    %[[VAL_3:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]], ptr, [1 x i64] }, i64 1
 // CHECK:    %[[VAL_2:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] }, i64 1
-
+func.func private @foo(%0: !fir.ref<!fir.class<none>>, %1: !fir.ref<!fir.class<!fir.array<?xnone>>>, %2: !fir.ref<!fir.box<none>>, %3: !fir.ref<!fir.box<!fir.array<?xnone>>>)
 func.func @alloca_unlimited_polymorphic_box() {
   %0 = fir.alloca !fir.class<none>
   %1 = fir.alloca !fir.class<!fir.array<?xnone>>
   %2 = fir.alloca !fir.box<none>
   %3 = fir.alloca !fir.box<!fir.array<?xnone>>
+  fir.call @foo(%0, %1, %2, %3) : (!fir.ref<!fir.class<none>>, !fir.ref<!fir.class<!fir.array<?xnone>>>, !fir.ref<!fir.box<none>>, !fir.ref<!fir.box<!fir.array<?xnone>>>) -> ()
   return
 }
 // Note: allocmem of fir.box are not possible (fir::HeapType::verify does not
diff --git a/flang/test/Fir/arrexp.fir b/flang/test/Fir/arrexp.fir
index e8ec8ac79e0c2..2eb717228d998 100644
--- a/flang/test/Fir/arrexp.fir
+++ b/flang/test/Fir/arrexp.fir
@@ -143,9 +143,9 @@ func.func @f6(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: f32) {
   %c9 = arith.constant 9 : index
   %c10 = arith.constant 10 : index
 
-  // CHECK: %[[EXT_GEP:.*]] = getelementptr {{.*}} %[[A]], i32 0, i32 7, i64 0, i32 1
+  // CHECK: %[[EXT_GEP:.*]] = getelementptr {{.*}} %[[A]], i32 0, i32 7, i32 0, i32 1
   // CHECK: %[[EXTENT:.*]] = load i64, ptr %[[EXT_GEP]]
-  // CHECK: %[[SIZE:.*]] = mul i64 4, %[[EXTENT]]
+  // CHECK: %[[SIZE:.*]] = mul i64 %[[EXTENT]], 4
   // CHECK: %[[CMP:.*]] = icmp sgt i64 %[[SIZE]], 0
   // CHECK: %[[SZ:.*]] = select i1 %[[CMP]], i64 %[[SIZE]], i64 1
   // CHECK: %[[MALLOC:.*]] = call ptr @malloc(i64 %[[SZ]])
diff --git a/flang/test/Fir/basic-program.fir b/flang/test/Fir/basic-program.fir
index c9fe53bf093a1..6bad03dded24d 100644
--- a/flang/test/Fir/basic-program.fir
+++ b/flang/test/Fir/basic-program.fir
@@ -158,4 +158,6 @@ func.func @_QQmain() {
 // PASSES-NEXT:  LowerNontemporalPass
 // PASSES-NEXT: FIRToLLVMLowering
 // PASSES-NEXT: ReconcileUnrealizedCasts
+// PASSES-NEXT: 'llvm.func' Pipeline
+// PASSES-NEXT: PrepareForOMPOffloadPrivatizationPass
 // PASSES-NEXT: LLVMIRLoweringPass
diff --git a/flang/test/Fir/box.fir b/flang/test/Fir/box.fir
index c0cf3d8375983..760fbd4792122 100644
--- a/flang/test/Fir/box.fir
+++ b/flang/test/Fir/box.fir
@@ -57,7 +57,7 @@ func.func @fa(%a : !fir.ref<!fir.array<100xf32>>) {
 // CHECK-SAME: ptr {{[^%]*}}%[[res:.*]], ptr {{[^%]*}}%[[arg0:.*]], i64 %[[arg1:.*]])
 func.func @b1(%arg0 : !fir.ref<!fir.char<1,?>>, %arg1 : index) -> !fir.box<!fir.char<1,?>> {
   // CHECK: %[[alloca:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8 }
-  // CHECK: %[[size:.*]] = mul i64 1, %[[arg1]]
+  // CHECK: %[[size:.*]] = mul i64 %[[arg1]], 1
   // CHECK: insertvalue {{.*}} undef, i64 %[[size]], 1
   // CHECK: insertvalue {{.*}} i32 20240719, 2
   // CHECK: insertvalue {{.*}} ptr %[[arg0]], 0
@@ -89,7 +89,7 @@ func.func @b2(%arg0 : !fir.ref<!fir.array<?x!fir.char<1,5>>>, %arg1 : index) ->
 func.func @b3(%arg0 : !fir.ref<!fir.array<?x!fir.char<1,?>>>, %arg1 : index, %arg2 : index) -> !fir.box<!fir.array<?x!fir.char<1,?>>> {
   %1 = fir.shape %arg2 : (index) -> !fir.shape<1>
   // CHECK: %[[alloca:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }
-  // CHECK: %[[size:.*]] = mul i64 1, %[[arg1]]
+  // CHECK: %[[size:.*]] = mul i64 %[[arg1]], 1
   // CHECK: insertvalue {{.*}} i64 %[[size]], 1
   // CHECK: insertvalue {{.*}} i32 20240719, 2
   // CHECK: insertvalue {{.*}} i64 %[[arg2]], 7, 0, 1
@@ -108,7 +108,7 @@ func.func @b4(%arg0 : !fir.ref<!fir.array<7x!fir.char<1,?>>>, %arg1 : index) ->
   %c_7 = arith.constant 7 : index
   %1 = fir.shape %c_7 : (index) -> !fir.shape<1>
   // CHECK: %[[alloca:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }
-  // CHECK:   %[[size:.*]] = mul i64 1, %[[arg1]]
+  // CHECK:   %[[size:.*]] = mul i64 %[[arg1]], 1
   // CHECK: insertvalue {{.*}} i64 %[[size]], 1
   // CHECK: insertvalue {{.*}} i32 20240719, 2
   // CHECK: insertvalue {{.*}} i64 7, 7, 0, 1
diff --git a/flang/test/Fir/boxproc.fir b/flang/test/Fir/boxproc.fir
index 97d9b38ed6f40..d4c36a4f5b213 100644
--- a/flang/test/Fir/boxproc.fir
+++ b/flang/test/Fir/boxproc.fir
@@ -82,12 +82,8 @@ func.func @_QPtest_proc_dummy_other(%arg0: !fir.boxproc<() -> ()>) {
 // CHECK:         store [1 x i8] c" ", ptr %[[VAL_18]], align 1
 // CHECK:         call void @llvm.init.trampoline(ptr %[[VAL_20]], ptr @_QFtest_proc_dummy_charPgen_message, ptr %[[VAL_2]])
 // CHECK:         %[[VAL_23:.*]] = call ptr @llvm.adjust.trampoline(ptr %[[VAL_20]])
-// CHECK:         %[[VAL_25:.*]] = insertvalue { ptr, i64 } undef, ptr %[[VAL_23]], 0
-// CHECK:         %[[VAL_26:.*]] = insertvalue { ptr, i64 } %[[VAL_25]], i64 10, 1
 // CHECK:         %[[VAL_27:.*]] = call ptr @llvm.stacksave.p0()
-// CHECK:         %[[VAL_28:.*]] = extractvalue { ptr, i64 } %[[VAL_26]], 0
-// CHECK:         %[[VAL_29:.*]] = extractvalue { ptr, i64 } %[[VAL_26]], 1
-// CHECK:         %[[VAL_30:.*]] = call { ptr, i64 } @_QPget_message(ptr %[[VAL_0]], i64 40, ptr %[[VAL_28]], i64 %[[VAL_29]])
+// CHECK:         %[[VAL_30:.*]] = call { ptr, i64 } @_QPget_message(ptr %[[VAL_0]], i64 40, ptr %[[VAL_23]], i64 10)
 // CHECK:         %[[VAL_32:.*]] = call i1 @_FortranAioOutputAscii(ptr %{{.*}}, ptr %[[VAL_0]], i64 40)
 // CHECK:         call void @llvm.stackrestore.p0(ptr %[[VAL_27]])
 
@@ -115,14 +111,10 @@ func.func @_QPtest_proc_dummy_other(%arg0: !fir.boxproc<() -> ()>) {
 // CHECK-LABEL: define { ptr, i64 } @_QPget_message(ptr
 // CHECK-SAME:                  %[[VAL_0:.*]], i64 %[[VAL_1:.*]], ptr %[[VAL_2:.*]], i64
 // CHECK-SAME:                                                 %[[VAL_3:.*]])
-// CHECK:         %[[VAL_4:.*]] = insertvalue { ptr, i64 } undef, ptr %[[VAL_2]], 0
-// CHECK:         %[[VAL_5:.*]] = insertvalue { ptr, i64 } %[[VAL_4]], i64 %[[VAL_3]], 1
-// CHECK:         %[[VAL_7:.*]] = extractvalue { ptr, i64 } %[[VAL_5]], 0
-// CHECK:         %[[VAL_8:.*]] = extractvalue { ptr, i64 } %[[VAL_5]], 1
 // CHECK:         %[[VAL_9:.*]] = call ptr @llvm.stacksave.p0()
-// CHECK:         %[[VAL_10:.*]] = alloca i8, i64 %[[VAL_8]], align 1
-// CHECK:         %[[VAL_12:.*]] = call { ptr, i64 } %[[VAL_7]](ptr %[[VAL_10]], i64 %[[VAL_8]])
-// CHECK:         %[[VAL_13:.*]] = add i64 %[[VAL_8]], 12
+// CHECK:         %[[VAL_10:.*]] = alloca i8, i64 %[[VAL_3]], align 1
+// CHECK:         %[[VAL_12:.*]] = call { ptr, i64 } %[[VAL_2]](ptr %[[VAL_10]], i64 %[[VAL_3]])
+// CHECK:         %[[VAL_13:.*]] = add i64 %[[VAL_3]], 12
 // CHECK:         %[[VAL_14:.*]] = alloca i8, i64 %[[VAL_13]], align 1
 // CHECK:         call void @llvm.memmove.p0.p0.i64(ptr %[[VAL_14]], ptr {{.*}}, i64 12, i1 false)
 // CHECK:         %[[VAL_18:.*]] = phi i64
diff --git a/flang/test/Fir/embox.fir b/flang/test/Fir/embox.fir
index 0f304cff2c79e..11f7457b6873c 100644
--- a/flang/test/Fir/embox.fir
+++ b/flang/test/Fir/embox.fir
@@ -11,7 +11,7 @@ func.func @_QPtest_callee(%arg0: !fir.box<!fir.array<?xi32>>) {
 func.func @_QPtest_slice() {
 // CHECK:  %[[a1:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8
 // CHECK:  %[[a2:.*]] = alloca [20 x i32], i64 1, align 4
-// CHECK:  %[[a3:.*]] = getelementptr [20 x i32], ptr %[[a2]], i64 0, i64 0
+// CHECK:  %[[a3:.*]] = getelementptr [20 x i32], ptr %[[a2]], i32 0, i64 0
 // CHECK:  %[[a4:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }
 // CHECK:  { ptr undef, i64 4, i32 20240719, i8 1, i8 9, i8 0, i8 0, [1 x [3 x i64]]
 // CHECK: [i64 1, i64 5, i64 8]] }, ptr %[[a3]], 0
@@ -38,7 +38,7 @@ func.func @_QPtest_dt_callee(%arg0: !fir.box<!fir.array<?xi32>>) {
 func.func @_QPtest_dt_slice() {
 // CHECK:  %[[a1:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8
 // CHECK:  %[[a3:.*]] = alloca [20 x %_QFtest_dt_sliceTt], i64 1, align 8
-// CHECK:  %[[a4:.*]] = getelementptr [20 x %_QFtest_dt_sliceTt], ptr %[[a3]], i64 0, i64 0, i32 0
+// CHECK:  %[[a4:.*]] = getelementptr [20 x %_QFtest_dt_sliceTt], ptr %[[a3]], i32 0, i64 0, i32 0
 // CHECK: %[[a5:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }
 // CHECK-SAME: { ptr undef, i64 4, i32 20240719, i8 1, i8 9, i8 0, i8 0, [1 x [3 x i64]]
 // CHECK-SAME: [i64 1, i64 5, i64 16
@@ -73,7 +73,7 @@ func.func @emboxSubstring(%arg0: !fir.ref<!fir.array<2x3x!fir.char<1,4>>>) {
   %0 = fir.shape %c2, %c3 : (index, index) -> !fir.shape<2>
   %1 = fir.slice %c1, %c2, %c1, %c1, %c3, %c1 substr %c1_i64, %c2_i64 : (index, index, index, index, index, index, i64, i64) -> !fir.slice<2>
   %2 = fir.embox %arg0(%0) [%1] : (!fir.ref<!fir.array<2x3x!fir.char<1,4>>>, !fir.shape<2>, !fir.slice<2>) -> !fir.box<!fir.array<?x?x!fir.char<1,?>>>
-  // CHECK: %[[addr:.*]] = getelementptr [3 x [2 x [4 x i8]]], ptr %[[arg0]], i64 0, i64 0, i64 0, i64 1
+  // CHECK: %[[addr:.*]] = getelementptr [3 x [2 x [4 x i8]]], ptr %[[arg0]], i32 0, i64 0, i64 0, i32 1
   // CHECK: insertvalue {[[descriptorType:.*]]} { ptr undef, i64 2, i32 20240719, i8 2, i8 40, i8 0, i8 0
   // CHECK-SAME: [2 x [3 x i64]] [{{\[}}3 x i64] [i64 1, i64 2, i64 4], [3 x i64] [i64 1, i64 3, i64 8]] }
   // CHECK-SAME: ptr %[[addr]], 0
diff --git a/flang/test/Fir/omp-reduction-embox-codegen.fir b/flang/test/Fir/omp-reduction-embox-codegen.fir
index 1645e1a407ad4..e517b1352ff5c 100644
--- a/flang/test/Fir/omp-reduction-embox-codegen.fir
+++ b/flang/test/Fir/omp-reduction-embox-codegen.fir
@@ -23,14 +23,14 @@ omp.declare_reduction @test_reduction : !fir.ref<!fir.box<i32>> init {
   omp.yield(%0 : !fir.ref<!fir.box<i32>>)
 }
 
-func.func @_QQmain() attributes {fir.bindc_name = "reduce"} {
+func.func @_QQmain()  -> !fir.ref<!fir.box<i32>> attributes {fir.bindc_name = "reduce"} {
   %4 = fir.alloca !fir.box<i32>
   omp.parallel reduction(byref @test_reduction %4 -> %arg0 : !fir.ref<!fir.box<i32>>) {
     omp.terminator
   }
-  return
+  return %4: !fir.ref<!fir.box<i32>>
 }
 
 // basically we are testing that there isn't a crash
-// CHECK-LABEL: define void @_QQmain
+// CHECK-LABEL: define ptr @_QQmain
 // CHECK-NEXT:    alloca { ptr, i64, i32, i8, i8, i8, i8 }, i64 1, align 8
diff --git a/flang/test/Fir/optional.fir b/flang/test/Fir/optional.fir
index bded8b5332a30..66ff69f083467 100644
--- a/flang/test/Fir/optional.fir
+++ b/flang/test/Fir/optional.fir
@@ -37,8 +37,7 @@ func.func @bar2() -> i1 {
 
 // CHECK-LABEL: @foo3
 func.func @foo3(%arg0: !fir.boxchar<1>) -> i1 {
-  // CHECK: %[[extract:.*]] = extractvalue { ptr, i64 } %{{.*}}, 0
-  // CHECK: %[[ptr:.*]] = ptrtoint ptr %[[extract]] to i64
+  // CHECK: %[[ptr:.*]] = ptrtoint ptr %0 to i64
   // CHECK: icmp ne i64 %[[ptr]], 0
   %0 = fir.is_present %arg0 : (!fir.boxchar<1>) -> i1
   return %0 : i1
diff --git a/flang/test/Fir/pdt.fir b/flang/test/Fir/pdt.fir
index a200cd7e7cc03..411927aae6bdf 100644
--- a/flang/test/Fir/pdt.fir
+++ b/flang/test/Fir/pdt.fir
@@ -96,13 +96,13 @@ func.func @_QTt1P.f2.offset(%0 : i32, %1 : i32) -> i32 {
 
 func.func private @bar(!fir.ref<!fir.char<1,?>>)
 
-// CHECK-LABEL: define void @_QPfoo(i32 %0, i32 %1)
-func.func @_QPfoo(%arg0 : i32, %arg1 : i32) {
+// CHECK-LABEL: define ptr @_QPfoo(i32 %0, i32 %1)
+func.func @_QPfoo(%arg0 : i32, %arg1 : i32) -> !fir.ref<!fir.type<_QTt1>> {
   // CHECK: %[[size:.*]] = call i64 @_QTt1P.mem.size(i32 %0, i32 %1)
   // CHECK: %[[alloc:.*]] = alloca i8, i64 %[[size]]
   %0 = fir.alloca !fir.type<_QTt1(p1:i32,p2:i32){f1:!fir.char<1,?>,f2:!fir.char<1,?>}>(%arg0, %arg1 : i32, i32)
   //%2 = fir.coordinate_of %0, f2 : (!fir.ref<!fir.type<_QTt1>>) -> !fir.ref<!fir.char<1,?>>
   %2 = fir.zero_bits !fir.ref<!fir.char<1,?>>
   fir.call @bar(%2) : (!fir.ref<!fir.char<1,?>>) -> ()
-  return
+  return %0 : !fir.ref<!fir.type<_QTt1>>
 }
diff --git a/flang/test/Fir/rebox.fir b/flang/test/Fir/rebox.fir
index 0c9f6d9bb94ad..d858adfb7c45d 100644
--- a/flang/test/Fir/rebox.fir
+++ b/flang/test/Fir/rebox.fir
@@ -36,7 +36,7 @@ func.func @test_rebox_1(%arg0: !fir.box<!fir.array<?x?xf32>>) {
   // CHECK: %[[VOIDBASE0:.*]] = getelementptr i8, ptr %[[INBASE]], i64 %[[OFFSET_0]]
   // CHECK: %[[OFFSET_1:.*]] = mul i64 2, %[[INSTRIDE_1]]
   // CHECK: %[[VOIDBASE1:.*]] = getelementptr i8, ptr %[[VOIDBASE0]], i64 %[[OFFSET_1]]
-  // CHECK: %[[OUTSTRIDE0:.*]] = mul i64 3, %[[INSTRIDE_1]]
+  // CHECK: %[[OUTSTRIDE0:.*]] = mul i64 %[[INSTRIDE_1]], 3
   // CHECK: %[[OUTBOX1:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } %{{.*}}, i64 %[[OUTSTRIDE0]], 7, 0, 2
   // CHECK: %[[OUTBOX2:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } %[[OUTBOX1]], ptr %[[VOIDBASE1]], 0
   // CHECK: store { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } %[[OUTBOX2]], ptr %[[OUTBOX_ALLOC]], align 8
@@ -63,7 +63,7 @@ func.func @test_rebox_2(%arg0: !fir.box<!fir.array<?x?x!fir.char<1,?>>>) {
   // CHECK: %[[OUTBOX:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [2 x [3 x i64]] }
   // CHECK: %[[LEN_GEP:.*]] = getelementptr { ptr, i64, i32, i8, i8, i8, i8, [2 x [3 x i64]] }, ptr %[[INBOX]], i32 0, i32 1
   // CHECK: %[[LEN:...
[truncated]

llvmbot · 2025-08-26T03:21:44Z

@llvm/pr-subscribers-mlir-openmp

Author: Pranav Bhandarkar (bhandarkar-pranav)

Changes

This PR adds support for translation of the private clause on deferred target tasks - that is omp.target operations with the nowait clause.

An offloading call for a deferred target-task is not blocking - the offloading host task continues it execution after issuing the offloading call. Therefore, the key problem we need to solve is to ensure that the data needed for private variables to be initialized in the target task persists even after the host task has completed.
We do this in a new pass called PrepareForOMPOffloadPrivatizationPass. For a privatized variable that needs its host counterpart for initialization (such as the shape of the data from the descriptor when an allocatable is privatized or the value of the data when an allocatable is firstprivatized),

the pass allocates memory on the heap.
it then initializes this memory by copying the contents of host variable to the newly allocated location on the heap.
Then, the pass updates all the omp.map.info operations that pointed to the host variable to now point to the one located in the heap.

The pass uses a rewrite pattern applied using the greedy pattern matcher, which in turn does some constant folding and DCE. Due to this a number of lit tests had to be updated. In GEPs constant get folded into indices and truncated to i32 types. In some tests sequence of insertvalue and extractvalue instructions get cancelled out. So, these needed to be updated too.

Patch is 79.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/155348.diff

30 Files Affected:

(modified) flang/include/flang/Optimizer/Passes/Pipelines.h (+1)
(modified) flang/lib/Optimizer/Passes/Pipelines.cpp (+6)
(modified) flang/test/Driver/tco-emit-final-mlir.fir (+2-2)
(modified) flang/test/Driver/tco-test-gen.fir (+2-3)
(modified) flang/test/Fir/alloc-32.fir (+1-1)
(modified) flang/test/Fir/alloc.fir (+9-8)
(modified) flang/test/Fir/arrexp.fir (+2-2)
(modified) flang/test/Fir/basic-program.fir (+2)
(modified) flang/test/Fir/box.fir (+3-3)
(modified) flang/test/Fir/boxproc.fir (+4-12)
(modified) flang/test/Fir/embox.fir (+3-3)
(modified) flang/test/Fir/omp-reduction-embox-codegen.fir (+3-3)
(modified) flang/test/Fir/optional.fir (+1-2)
(modified) flang/test/Fir/pdt.fir (+3-3)
(modified) flang/test/Fir/rebox.fir (+9-9)
(modified) flang/test/Fir/select.fir (+1-1)
(modified) flang/test/Fir/target.fir (-4)
(modified) flang/test/Fir/tbaa-codegen2.fir (+3-9)
(modified) flang/test/Integration/OpenMP/map-types-and-sizes.f90 (+7-7)
(modified) flang/test/Lower/allocatable-polymorphic.f90 (-4)
(modified) flang/test/Lower/forall/character-1.f90 (+2-2)
(added) mlir/include/mlir/Dialect/LLVMIR/Transforms/OpenMPOffloadPrivatizationPrepare.h (+23)
(modified) mlir/include/mlir/Dialect/LLVMIR/Transforms/Passes.td (+12)
(modified) mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td (+2-2)
(modified) mlir/lib/Dialect/LLVMIR/Transforms/CMakeLists.txt (+2)
(added) mlir/lib/Dialect/LLVMIR/Transforms/OpenMPOffloadPrivatizationPrepare.cpp (+425)
(modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+2-9)
(modified) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp (+1)
(added) mlir/test/Dialect/LLVMIR/omp-offload-privatization-prepare.mlir (+167)
(modified) mlir/test/Target/LLVMIR/openmp-todo.mlir (-18)

diff --git a/flang/include/flang/Optimizer/Passes/Pipelines.h b/flang/include/flang/Optimizer/Passes/Pipelines.h
index a3f59ee8dd013..17d48f46e4b9b 100644
--- a/flang/include/flang/Optimizer/Passes/Pipelines.h
+++ b/flang/include/flang/Optimizer/Passes/Pipelines.h
@@ -22,6 +22,7 @@
 #include "mlir/Conversion/SCFToControlFlow/SCFToControlFlow.h"
 #include "mlir/Dialect/GPU/IR/GPUDialect.h"
 #include "mlir/Dialect/LLVMIR/LLVMAttrs.h"
+#include "mlir/Dialect/LLVMIR/Transforms/OpenMPOffloadPrivatizationPrepare.h"
 #include "mlir/Pass/PassManager.h"
 #include "mlir/Transforms/GreedyPatternRewriteDriver.h"
 #include "mlir/Transforms/Passes.h"
diff --git a/flang/lib/Optimizer/Passes/Pipelines.cpp b/flang/lib/Optimizer/Passes/Pipelines.cpp
index ca8e820608688..6a11461cd8380 100644
--- a/flang/lib/Optimizer/Passes/Pipelines.cpp
+++ b/flang/lib/Optimizer/Passes/Pipelines.cpp
@@ -403,6 +403,12 @@ void createMLIRToLLVMPassPipeline(mlir::PassManager &pm,
 
   // Add codegen pass pipeline.
   fir::createDefaultFIRCodeGenPassPipeline(pm, config, inputFilename);
+
+  // Run a pass to prepare for translation of delayed privatization in the
+  // context of deferred target tasks.
+  addNestedPassConditionally<mlir::LLVM::LLVMFuncOp>(pm, disableFirToLlvmIr,[&]() {
+    return mlir::LLVM::createPrepareForOMPOffloadPrivatizationPass();
+  });
 }
 
 } // namespace fir
diff --git a/flang/test/Driver/tco-emit-final-mlir.fir b/flang/test/Driver/tco-emit-final-mlir.fir
index 75f8f153127af..177810cf41378 100644
--- a/flang/test/Driver/tco-emit-final-mlir.fir
+++ b/flang/test/Driver/tco-emit-final-mlir.fir
@@ -13,7 +13,7 @@
 // CHECK: llvm.return
 // CHECK-NOT: func.func
 
-func.func @_QPfoo() {
+func.func @_QPfoo() -> !fir.ref<i32> {
   %1 = fir.alloca i32
-  return
+  return %1 : !fir.ref<i32>
 }
diff --git a/flang/test/Driver/tco-test-gen.fir b/flang/test/Driver/tco-test-gen.fir
index 38d4e50ecf3aa..15483f7ee3534 100644
--- a/flang/test/Driver/tco-test-gen.fir
+++ b/flang/test/Driver/tco-test-gen.fir
@@ -42,11 +42,10 @@ func.func @_QPtest(%arg0: !fir.ref<i32> {fir.bindc_name = "num"}, %arg1: !fir.re
 // CHECK-SAME:      %[[ARG2:.*]]: !llvm.ptr {fir.bindc_name = "ub", llvm.nocapture},
 // CHECK-SAME:      %[[ARG3:.*]]: !llvm.ptr {fir.bindc_name = "step", llvm.nocapture}) {
 
+// CMPLX:           %[[VAL_3:.*]] = llvm.mlir.constant(0 : index) : i64
+// CMPLX:           %[[VAL_2:.*]] = llvm.mlir.constant(1 : index) : i64
 // CMPLX:           %[[VAL_0:.*]] = llvm.mlir.constant(1 : i64) : i64
 // CMPLX:           %[[VAL_1:.*]] = llvm.alloca %[[VAL_0]] x i32 {bindc_name = "i"} : (i64) -> !llvm.ptr
-// CMPLX:           %[[VAL_2:.*]] = llvm.mlir.constant(1 : index) : i64
-// CMPLX:           %[[VAL_3:.*]] = llvm.mlir.constant(0 : index) : i64
-// CMPLX:           %[[VAL_4:.*]] = llvm.mlir.constant(1 : i64) : i64
 
 // SIMPLE:          %[[VAL_3:.*]] = llvm.mlir.constant(0 : index) : i64
 // SIMPLE:          %[[VAL_2:.*]] = llvm.mlir.constant(1 : index) : i64
diff --git a/flang/test/Fir/alloc-32.fir b/flang/test/Fir/alloc-32.fir
index a3cbf200c24fc..f57f6ce6fcf5e 100644
--- a/flang/test/Fir/alloc-32.fir
+++ b/flang/test/Fir/alloc-32.fir
@@ -19,7 +19,7 @@ func.func @allocmem_scalar_nonchar() -> !fir.heap<i32> {
 // CHECK-LABEL: define ptr @allocmem_scalar_dynchar(
 // CHECK-SAME: i32 %[[len:.*]])
 // CHECK: %[[mul1:.*]] = sext i32 %[[len]] to i64
-// CHECK: %[[mul2:.*]] = mul i64 1, %[[mul1]]
+// CHECK: %[[mul2:.*]] = mul i64 %[[mul1]], 1
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[mul2]], 0
 // CHECK: %[[sz:.*]] = select i1 %[[cmp]], i64 %[[mul2]], i64 1
 // CHECK: %[[trunc:.*]] = trunc i64 %[[sz]] to i32
diff --git a/flang/test/Fir/alloc.fir b/flang/test/Fir/alloc.fir
index 8da8b828c18b9..0d3ce323d0d7c 100644
--- a/flang/test/Fir/alloc.fir
+++ b/flang/test/Fir/alloc.fir
@@ -86,7 +86,7 @@ func.func @alloca_scalar_dynchar_kind(%l : i32) -> !fir.ref<!fir.char<2,?>> {
 // CHECK-LABEL: define ptr @allocmem_scalar_dynchar(
 // CHECK-SAME: i32 %[[len:.*]])
 // CHECK: %[[mul1:.*]] = sext i32 %[[len]] to i64
-// CHECK: %[[mul2:.*]] = mul i64 1, %[[mul1]]
+// CHECK: %[[mul2:.*]] = mul i64 %[[mul1]], 1
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[mul2]], 0
 // CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[mul2]], i64 1
 // CHECK: call ptr @malloc(i64 %[[size]])
@@ -98,7 +98,7 @@ func.func @allocmem_scalar_dynchar(%l : i32) -> !fir.heap<!fir.char<1,?>> {
 // CHECK-LABEL: define ptr @allocmem_scalar_dynchar_kind(
 // CHECK-SAME: i32 %[[len:.*]])
 // CHECK: %[[mul1:.*]] = sext i32 %[[len]] to i64
-// CHECK: %[[mul2:.*]] = mul i64 2, %[[mul1]]
+// CHECK: %[[mul2:.*]] = mul i64 %[[mul1]], 2
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[mul2]], 0
 // CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[mul2]], i64 1
 // CHECK: call ptr @malloc(i64 %[[size]])
@@ -185,7 +185,7 @@ func.func @alloca_dynarray_of_nonchar2(%e: index) -> !fir.ref<!fir.array<?x?xi32
 
 // CHECK-LABEL: define ptr @allocmem_dynarray_of_nonchar(
 // CHECK-SAME: i64 %[[extent:.*]])
-// CHECK: %[[prod1:.*]] = mul i64 12, %[[extent]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[extent]], 12
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod1]], 0
 // CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[prod1]], i64 1
 // CHECK: call ptr @malloc(i64 %[[size]])
@@ -196,7 +196,7 @@ func.func @allocmem_dynarray_of_nonchar(%e: index) -> !fir.heap<!fir.array<3x?xi
 
 // CHECK-LABEL: define ptr @allocmem_dynarray_of_nonchar2(
 // CHECK-SAME: i64 %[[extent:.*]])
-// CHECK: %[[prod1:.*]] = mul i64 4, %[[extent]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[extent]], 4
 // CHECK: %[[prod2:.*]] = mul i64 %[[prod1]], %[[extent]]
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod2]], 0
 // CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[prod2]], i64 1
@@ -227,7 +227,7 @@ func.func @alloca_dynarray_of_char2(%e : index) -> !fir.ref<!fir.array<?x?x!fir.
 
 // CHECK-LABEL: define ptr @allocmem_dynarray_of_char(
 // CHECK-SAME: i64 %[[extent:.*]])
-// CHECK: %[[prod1:.*]] = mul i64 60, %[[extent]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[extent]], 60
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod1]], 0
 // CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[prod1]], i64 1
 // CHECK: call ptr @malloc(i64 %[[size]])
@@ -238,7 +238,7 @@ func.func @allocmem_dynarray_of_char(%e : index) -> !fir.heap<!fir.array<3x?x!fi
 
 // CHECK-LABEL: define ptr @allocmem_dynarray_of_char2(
 // CHECK-SAME: i64 %[[extent:.*]])
-// CHECK: %[[prod1:.*]] = mul i64 20, %[[extent]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[extent]], 20
 // CHECK: %[[prod2:.*]] = mul i64 %[[prod1]], %[[extent]]
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod2]], 0
 // CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[mul2]], i64 1
@@ -286,7 +286,7 @@ func.func @allocmem_dynarray_of_dynchar(%l: i32, %e : index) -> !fir.heap<!fir.a
 // CHECK-LABEL: define ptr @allocmem_dynarray_of_dynchar2(
 // CHECK-SAME: i32 %[[len:.*]], i64 %[[extent:.*]])
 // CHECK: %[[a:.*]] = sext i32 %[[len]] to i64
-// CHECK: %[[prod1:.*]] = mul i64 2, %[[a]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[a]], 2
 // CHECK: %[[prod2:.*]] = mul i64 %[[prod1]], %[[extent]]
 // CHECK: %[[prod3:.*]] = mul i64 %[[prod2]], %[[extent]]
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod3]], 0
@@ -366,12 +366,13 @@ func.func @allocmem_array_with_holes_dynchar(%arg0: index, %arg1: index) -> !fir
 // CHECK:    %[[VAL_0:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] }, i64 1
 // CHECK:    %[[VAL_3:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]], ptr, [1 x i64] }, i64 1
 // CHECK:    %[[VAL_2:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] }, i64 1
-
+func.func private @foo(%0: !fir.ref<!fir.class<none>>, %1: !fir.ref<!fir.class<!fir.array<?xnone>>>, %2: !fir.ref<!fir.box<none>>, %3: !fir.ref<!fir.box<!fir.array<?xnone>>>)
 func.func @alloca_unlimited_polymorphic_box() {
   %0 = fir.alloca !fir.class<none>
   %1 = fir.alloca !fir.class<!fir.array<?xnone>>
   %2 = fir.alloca !fir.box<none>
   %3 = fir.alloca !fir.box<!fir.array<?xnone>>
+  fir.call @foo(%0, %1, %2, %3) : (!fir.ref<!fir.class<none>>, !fir.ref<!fir.class<!fir.array<?xnone>>>, !fir.ref<!fir.box<none>>, !fir.ref<!fir.box<!fir.array<?xnone>>>) -> ()
   return
 }
 // Note: allocmem of fir.box are not possible (fir::HeapType::verify does not
diff --git a/flang/test/Fir/arrexp.fir b/flang/test/Fir/arrexp.fir
index e8ec8ac79e0c2..2eb717228d998 100644
--- a/flang/test/Fir/arrexp.fir
+++ b/flang/test/Fir/arrexp.fir
@@ -143,9 +143,9 @@ func.func @f6(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: f32) {
   %c9 = arith.constant 9 : index
   %c10 = arith.constant 10 : index
 
-  // CHECK: %[[EXT_GEP:.*]] = getelementptr {{.*}} %[[A]], i32 0, i32 7, i64 0, i32 1
+  // CHECK: %[[EXT_GEP:.*]] = getelementptr {{.*}} %[[A]], i32 0, i32 7, i32 0, i32 1
   // CHECK: %[[EXTENT:.*]] = load i64, ptr %[[EXT_GEP]]
-  // CHECK: %[[SIZE:.*]] = mul i64 4, %[[EXTENT]]
+  // CHECK: %[[SIZE:.*]] = mul i64 %[[EXTENT]], 4
   // CHECK: %[[CMP:.*]] = icmp sgt i64 %[[SIZE]], 0
   // CHECK: %[[SZ:.*]] = select i1 %[[CMP]], i64 %[[SIZE]], i64 1
   // CHECK: %[[MALLOC:.*]] = call ptr @malloc(i64 %[[SZ]])
diff --git a/flang/test/Fir/basic-program.fir b/flang/test/Fir/basic-program.fir
index c9fe53bf093a1..6bad03dded24d 100644
--- a/flang/test/Fir/basic-program.fir
+++ b/flang/test/Fir/basic-program.fir
@@ -158,4 +158,6 @@ func.func @_QQmain() {
 // PASSES-NEXT:  LowerNontemporalPass
 // PASSES-NEXT: FIRToLLVMLowering
 // PASSES-NEXT: ReconcileUnrealizedCasts
+// PASSES-NEXT: 'llvm.func' Pipeline
+// PASSES-NEXT: PrepareForOMPOffloadPrivatizationPass
 // PASSES-NEXT: LLVMIRLoweringPass
diff --git a/flang/test/Fir/box.fir b/flang/test/Fir/box.fir
index c0cf3d8375983..760fbd4792122 100644
--- a/flang/test/Fir/box.fir
+++ b/flang/test/Fir/box.fir
@@ -57,7 +57,7 @@ func.func @fa(%a : !fir.ref<!fir.array<100xf32>>) {
 // CHECK-SAME: ptr {{[^%]*}}%[[res:.*]], ptr {{[^%]*}}%[[arg0:.*]], i64 %[[arg1:.*]])
 func.func @b1(%arg0 : !fir.ref<!fir.char<1,?>>, %arg1 : index) -> !fir.box<!fir.char<1,?>> {
   // CHECK: %[[alloca:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8 }
-  // CHECK: %[[size:.*]] = mul i64 1, %[[arg1]]
+  // CHECK: %[[size:.*]] = mul i64 %[[arg1]], 1
   // CHECK: insertvalue {{.*}} undef, i64 %[[size]], 1
   // CHECK: insertvalue {{.*}} i32 20240719, 2
   // CHECK: insertvalue {{.*}} ptr %[[arg0]], 0
@@ -89,7 +89,7 @@ func.func @b2(%arg0 : !fir.ref<!fir.array<?x!fir.char<1,5>>>, %arg1 : index) ->
 func.func @b3(%arg0 : !fir.ref<!fir.array<?x!fir.char<1,?>>>, %arg1 : index, %arg2 : index) -> !fir.box<!fir.array<?x!fir.char<1,?>>> {
   %1 = fir.shape %arg2 : (index) -> !fir.shape<1>
   // CHECK: %[[alloca:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }
-  // CHECK: %[[size:.*]] = mul i64 1, %[[arg1]]
+  // CHECK: %[[size:.*]] = mul i64 %[[arg1]], 1
   // CHECK: insertvalue {{.*}} i64 %[[size]], 1
   // CHECK: insertvalue {{.*}} i32 20240719, 2
   // CHECK: insertvalue {{.*}} i64 %[[arg2]], 7, 0, 1
@@ -108,7 +108,7 @@ func.func @b4(%arg0 : !fir.ref<!fir.array<7x!fir.char<1,?>>>, %arg1 : index) ->
   %c_7 = arith.constant 7 : index
   %1 = fir.shape %c_7 : (index) -> !fir.shape<1>
   // CHECK: %[[alloca:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }
-  // CHECK:   %[[size:.*]] = mul i64 1, %[[arg1]]
+  // CHECK:   %[[size:.*]] = mul i64 %[[arg1]], 1
   // CHECK: insertvalue {{.*}} i64 %[[size]], 1
   // CHECK: insertvalue {{.*}} i32 20240719, 2
   // CHECK: insertvalue {{.*}} i64 7, 7, 0, 1
diff --git a/flang/test/Fir/boxproc.fir b/flang/test/Fir/boxproc.fir
index 97d9b38ed6f40..d4c36a4f5b213 100644
--- a/flang/test/Fir/boxproc.fir
+++ b/flang/test/Fir/boxproc.fir
@@ -82,12 +82,8 @@ func.func @_QPtest_proc_dummy_other(%arg0: !fir.boxproc<() -> ()>) {
 // CHECK:         store [1 x i8] c" ", ptr %[[VAL_18]], align 1
 // CHECK:         call void @llvm.init.trampoline(ptr %[[VAL_20]], ptr @_QFtest_proc_dummy_charPgen_message, ptr %[[VAL_2]])
 // CHECK:         %[[VAL_23:.*]] = call ptr @llvm.adjust.trampoline(ptr %[[VAL_20]])
-// CHECK:         %[[VAL_25:.*]] = insertvalue { ptr, i64 } undef, ptr %[[VAL_23]], 0
-// CHECK:         %[[VAL_26:.*]] = insertvalue { ptr, i64 } %[[VAL_25]], i64 10, 1
 // CHECK:         %[[VAL_27:.*]] = call ptr @llvm.stacksave.p0()
-// CHECK:         %[[VAL_28:.*]] = extractvalue { ptr, i64 } %[[VAL_26]], 0
-// CHECK:         %[[VAL_29:.*]] = extractvalue { ptr, i64 } %[[VAL_26]], 1
-// CHECK:         %[[VAL_30:.*]] = call { ptr, i64 } @_QPget_message(ptr %[[VAL_0]], i64 40, ptr %[[VAL_28]], i64 %[[VAL_29]])
+// CHECK:         %[[VAL_30:.*]] = call { ptr, i64 } @_QPget_message(ptr %[[VAL_0]], i64 40, ptr %[[VAL_23]], i64 10)
 // CHECK:         %[[VAL_32:.*]] = call i1 @_FortranAioOutputAscii(ptr %{{.*}}, ptr %[[VAL_0]], i64 40)
 // CHECK:         call void @llvm.stackrestore.p0(ptr %[[VAL_27]])
 
@@ -115,14 +111,10 @@ func.func @_QPtest_proc_dummy_other(%arg0: !fir.boxproc<() -> ()>) {
 // CHECK-LABEL: define { ptr, i64 } @_QPget_message(ptr
 // CHECK-SAME:                  %[[VAL_0:.*]], i64 %[[VAL_1:.*]], ptr %[[VAL_2:.*]], i64
 // CHECK-SAME:                                                 %[[VAL_3:.*]])
-// CHECK:         %[[VAL_4:.*]] = insertvalue { ptr, i64 } undef, ptr %[[VAL_2]], 0
-// CHECK:         %[[VAL_5:.*]] = insertvalue { ptr, i64 } %[[VAL_4]], i64 %[[VAL_3]], 1
-// CHECK:         %[[VAL_7:.*]] = extractvalue { ptr, i64 } %[[VAL_5]], 0
-// CHECK:         %[[VAL_8:.*]] = extractvalue { ptr, i64 } %[[VAL_5]], 1
 // CHECK:         %[[VAL_9:.*]] = call ptr @llvm.stacksave.p0()
-// CHECK:         %[[VAL_10:.*]] = alloca i8, i64 %[[VAL_8]], align 1
-// CHECK:         %[[VAL_12:.*]] = call { ptr, i64 } %[[VAL_7]](ptr %[[VAL_10]], i64 %[[VAL_8]])
-// CHECK:         %[[VAL_13:.*]] = add i64 %[[VAL_8]], 12
+// CHECK:         %[[VAL_10:.*]] = alloca i8, i64 %[[VAL_3]], align 1
+// CHECK:         %[[VAL_12:.*]] = call { ptr, i64 } %[[VAL_2]](ptr %[[VAL_10]], i64 %[[VAL_3]])
+// CHECK:         %[[VAL_13:.*]] = add i64 %[[VAL_3]], 12
 // CHECK:         %[[VAL_14:.*]] = alloca i8, i64 %[[VAL_13]], align 1
 // CHECK:         call void @llvm.memmove.p0.p0.i64(ptr %[[VAL_14]], ptr {{.*}}, i64 12, i1 false)
 // CHECK:         %[[VAL_18:.*]] = phi i64
diff --git a/flang/test/Fir/embox.fir b/flang/test/Fir/embox.fir
index 0f304cff2c79e..11f7457b6873c 100644
--- a/flang/test/Fir/embox.fir
+++ b/flang/test/Fir/embox.fir
@@ -11,7 +11,7 @@ func.func @_QPtest_callee(%arg0: !fir.box<!fir.array<?xi32>>) {
 func.func @_QPtest_slice() {
 // CHECK:  %[[a1:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8
 // CHECK:  %[[a2:.*]] = alloca [20 x i32], i64 1, align 4
-// CHECK:  %[[a3:.*]] = getelementptr [20 x i32], ptr %[[a2]], i64 0, i64 0
+// CHECK:  %[[a3:.*]] = getelementptr [20 x i32], ptr %[[a2]], i32 0, i64 0
 // CHECK:  %[[a4:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }
 // CHECK:  { ptr undef, i64 4, i32 20240719, i8 1, i8 9, i8 0, i8 0, [1 x [3 x i64]]
 // CHECK: [i64 1, i64 5, i64 8]] }, ptr %[[a3]], 0
@@ -38,7 +38,7 @@ func.func @_QPtest_dt_callee(%arg0: !fir.box<!fir.array<?xi32>>) {
 func.func @_QPtest_dt_slice() {
 // CHECK:  %[[a1:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8
 // CHECK:  %[[a3:.*]] = alloca [20 x %_QFtest_dt_sliceTt], i64 1, align 8
-// CHECK:  %[[a4:.*]] = getelementptr [20 x %_QFtest_dt_sliceTt], ptr %[[a3]], i64 0, i64 0, i32 0
+// CHECK:  %[[a4:.*]] = getelementptr [20 x %_QFtest_dt_sliceTt], ptr %[[a3]], i32 0, i64 0, i32 0
 // CHECK: %[[a5:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }
 // CHECK-SAME: { ptr undef, i64 4, i32 20240719, i8 1, i8 9, i8 0, i8 0, [1 x [3 x i64]]
 // CHECK-SAME: [i64 1, i64 5, i64 16
@@ -73,7 +73,7 @@ func.func @emboxSubstring(%arg0: !fir.ref<!fir.array<2x3x!fir.char<1,4>>>) {
   %0 = fir.shape %c2, %c3 : (index, index) -> !fir.shape<2>
   %1 = fir.slice %c1, %c2, %c1, %c1, %c3, %c1 substr %c1_i64, %c2_i64 : (index, index, index, index, index, index, i64, i64) -> !fir.slice<2>
   %2 = fir.embox %arg0(%0) [%1] : (!fir.ref<!fir.array<2x3x!fir.char<1,4>>>, !fir.shape<2>, !fir.slice<2>) -> !fir.box<!fir.array<?x?x!fir.char<1,?>>>
-  // CHECK: %[[addr:.*]] = getelementptr [3 x [2 x [4 x i8]]], ptr %[[arg0]], i64 0, i64 0, i64 0, i64 1
+  // CHECK: %[[addr:.*]] = getelementptr [3 x [2 x [4 x i8]]], ptr %[[arg0]], i32 0, i64 0, i64 0, i32 1
   // CHECK: insertvalue {[[descriptorType:.*]]} { ptr undef, i64 2, i32 20240719, i8 2, i8 40, i8 0, i8 0
   // CHECK-SAME: [2 x [3 x i64]] [{{\[}}3 x i64] [i64 1, i64 2, i64 4], [3 x i64] [i64 1, i64 3, i64 8]] }
   // CHECK-SAME: ptr %[[addr]], 0
diff --git a/flang/test/Fir/omp-reduction-embox-codegen.fir b/flang/test/Fir/omp-reduction-embox-codegen.fir
index 1645e1a407ad4..e517b1352ff5c 100644
--- a/flang/test/Fir/omp-reduction-embox-codegen.fir
+++ b/flang/test/Fir/omp-reduction-embox-codegen.fir
@@ -23,14 +23,14 @@ omp.declare_reduction @test_reduction : !fir.ref<!fir.box<i32>> init {
   omp.yield(%0 : !fir.ref<!fir.box<i32>>)
 }
 
-func.func @_QQmain() attributes {fir.bindc_name = "reduce"} {
+func.func @_QQmain()  -> !fir.ref<!fir.box<i32>> attributes {fir.bindc_name = "reduce"} {
   %4 = fir.alloca !fir.box<i32>
   omp.parallel reduction(byref @test_reduction %4 -> %arg0 : !fir.ref<!fir.box<i32>>) {
     omp.terminator
   }
-  return
+  return %4: !fir.ref<!fir.box<i32>>
 }
 
 // basically we are testing that there isn't a crash
-// CHECK-LABEL: define void @_QQmain
+// CHECK-LABEL: define ptr @_QQmain
 // CHECK-NEXT:    alloca { ptr, i64, i32, i8, i8, i8, i8 }, i64 1, align 8
diff --git a/flang/test/Fir/optional.fir b/flang/test/Fir/optional.fir
index bded8b5332a30..66ff69f083467 100644
--- a/flang/test/Fir/optional.fir
+++ b/flang/test/Fir/optional.fir
@@ -37,8 +37,7 @@ func.func @bar2() -> i1 {
 
 // CHECK-LABEL: @foo3
 func.func @foo3(%arg0: !fir.boxchar<1>) -> i1 {
-  // CHECK: %[[extract:.*]] = extractvalue { ptr, i64 } %{{.*}}, 0
-  // CHECK: %[[ptr:.*]] = ptrtoint ptr %[[extract]] to i64
+  // CHECK: %[[ptr:.*]] = ptrtoint ptr %0 to i64
   // CHECK: icmp ne i64 %[[ptr]], 0
   %0 = fir.is_present %arg0 : (!fir.boxchar<1>) -> i1
   return %0 : i1
diff --git a/flang/test/Fir/pdt.fir b/flang/test/Fir/pdt.fir
index a200cd7e7cc03..411927aae6bdf 100644
--- a/flang/test/Fir/pdt.fir
+++ b/flang/test/Fir/pdt.fir
@@ -96,13 +96,13 @@ func.func @_QTt1P.f2.offset(%0 : i32, %1 : i32) -> i32 {
 
 func.func private @bar(!fir.ref<!fir.char<1,?>>)
 
-// CHECK-LABEL: define void @_QPfoo(i32 %0, i32 %1)
-func.func @_QPfoo(%arg0 : i32, %arg1 : i32) {
+// CHECK-LABEL: define ptr @_QPfoo(i32 %0, i32 %1)
+func.func @_QPfoo(%arg0 : i32, %arg1 : i32) -> !fir.ref<!fir.type<_QTt1>> {
   // CHECK: %[[size:.*]] = call i64 @_QTt1P.mem.size(i32 %0, i32 %1)
   // CHECK: %[[alloc:.*]] = alloca i8, i64 %[[size]]
   %0 = fir.alloca !fir.type<_QTt1(p1:i32,p2:i32){f1:!fir.char<1,?>,f2:!fir.char<1,?>}>(%arg0, %arg1 : i32, i32)
   //%2 = fir.coordinate_of %0, f2 : (!fir.ref<!fir.type<_QTt1>>) -> !fir.ref<!fir.char<1,?>>
   %2 = fir.zero_bits !fir.ref<!fir.char<1,?>>
   fir.call @bar(%2) : (!fir.ref<!fir.char<1,?>>) -> ()
-  return
+  return %0 : !fir.ref<!fir.type<_QTt1>>
 }
diff --git a/flang/test/Fir/rebox.fir b/flang/test/Fir/rebox.fir
index 0c9f6d9bb94ad..d858adfb7c45d 100644
--- a/flang/test/Fir/rebox.fir
+++ b/flang/test/Fir/rebox.fir
@@ -36,7 +36,7 @@ func.func @test_rebox_1(%arg0: !fir.box<!fir.array<?x?xf32>>) {
   // CHECK: %[[VOIDBASE0:.*]] = getelementptr i8, ptr %[[INBASE]], i64 %[[OFFSET_0]]
   // CHECK: %[[OFFSET_1:.*]] = mul i64 2, %[[INSTRIDE_1]]
   // CHECK: %[[VOIDBASE1:.*]] = getelementptr i8, ptr %[[VOIDBASE0]], i64 %[[OFFSET_1]]
-  // CHECK: %[[OUTSTRIDE0:.*]] = mul i64 3, %[[INSTRIDE_1]]
+  // CHECK: %[[OUTSTRIDE0:.*]] = mul i64 %[[INSTRIDE_1]], 3
   // CHECK: %[[OUTBOX1:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } %{{.*}}, i64 %[[OUTSTRIDE0]], 7, 0, 2
   // CHECK: %[[OUTBOX2:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } %[[OUTBOX1]], ptr %[[VOIDBASE1]], 0
   // CHECK: store { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } %[[OUTBOX2]], ptr %[[OUTBOX_ALLOC]], align 8
@@ -63,7 +63,7 @@ func.func @test_rebox_2(%arg0: !fir.box<!fir.array<?x?x!fir.char<1,?>>>) {
   // CHECK: %[[OUTBOX:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [2 x [3 x i64]] }
   // CHECK: %[[LEN_GEP:.*]] = getelementptr { ptr, i64, i32, i8, i8, i8, i8, [2 x [3 x i64]] }, ptr %[[INBOX]], i32 0, i32 1
   // CHECK: %[[LEN:...
[truncated]

llvmbot · 2025-08-26T03:21:44Z

@llvm/pr-subscribers-flang-openmp

Author: Pranav Bhandarkar (bhandarkar-pranav)

Changes

This PR adds support for translation of the private clause on deferred target tasks - that is omp.target operations with the nowait clause.

An offloading call for a deferred target-task is not blocking - the offloading host task continues it execution after issuing the offloading call. Therefore, the key problem we need to solve is to ensure that the data needed for private variables to be initialized in the target task persists even after the host task has completed.
We do this in a new pass called PrepareForOMPOffloadPrivatizationPass. For a privatized variable that needs its host counterpart for initialization (such as the shape of the data from the descriptor when an allocatable is privatized or the value of the data when an allocatable is firstprivatized),

the pass allocates memory on the heap.
it then initializes this memory by copying the contents of host variable to the newly allocated location on the heap.
Then, the pass updates all the omp.map.info operations that pointed to the host variable to now point to the one located in the heap.

The pass uses a rewrite pattern applied using the greedy pattern matcher, which in turn does some constant folding and DCE. Due to this a number of lit tests had to be updated. In GEPs constant get folded into indices and truncated to i32 types. In some tests sequence of insertvalue and extractvalue instructions get cancelled out. So, these needed to be updated too.

Patch is 79.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/155348.diff

30 Files Affected:

(modified) flang/include/flang/Optimizer/Passes/Pipelines.h (+1)
(modified) flang/lib/Optimizer/Passes/Pipelines.cpp (+6)
(modified) flang/test/Driver/tco-emit-final-mlir.fir (+2-2)
(modified) flang/test/Driver/tco-test-gen.fir (+2-3)
(modified) flang/test/Fir/alloc-32.fir (+1-1)
(modified) flang/test/Fir/alloc.fir (+9-8)
(modified) flang/test/Fir/arrexp.fir (+2-2)
(modified) flang/test/Fir/basic-program.fir (+2)
(modified) flang/test/Fir/box.fir (+3-3)
(modified) flang/test/Fir/boxproc.fir (+4-12)
(modified) flang/test/Fir/embox.fir (+3-3)
(modified) flang/test/Fir/omp-reduction-embox-codegen.fir (+3-3)
(modified) flang/test/Fir/optional.fir (+1-2)
(modified) flang/test/Fir/pdt.fir (+3-3)
(modified) flang/test/Fir/rebox.fir (+9-9)
(modified) flang/test/Fir/select.fir (+1-1)
(modified) flang/test/Fir/target.fir (-4)
(modified) flang/test/Fir/tbaa-codegen2.fir (+3-9)
(modified) flang/test/Integration/OpenMP/map-types-and-sizes.f90 (+7-7)
(modified) flang/test/Lower/allocatable-polymorphic.f90 (-4)
(modified) flang/test/Lower/forall/character-1.f90 (+2-2)
(added) mlir/include/mlir/Dialect/LLVMIR/Transforms/OpenMPOffloadPrivatizationPrepare.h (+23)
(modified) mlir/include/mlir/Dialect/LLVMIR/Transforms/Passes.td (+12)
(modified) mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td (+2-2)
(modified) mlir/lib/Dialect/LLVMIR/Transforms/CMakeLists.txt (+2)
(added) mlir/lib/Dialect/LLVMIR/Transforms/OpenMPOffloadPrivatizationPrepare.cpp (+425)
(modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+2-9)
(modified) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp (+1)
(added) mlir/test/Dialect/LLVMIR/omp-offload-privatization-prepare.mlir (+167)
(modified) mlir/test/Target/LLVMIR/openmp-todo.mlir (-18)

diff --git a/flang/include/flang/Optimizer/Passes/Pipelines.h b/flang/include/flang/Optimizer/Passes/Pipelines.h
index a3f59ee8dd013..17d48f46e4b9b 100644
--- a/flang/include/flang/Optimizer/Passes/Pipelines.h
+++ b/flang/include/flang/Optimizer/Passes/Pipelines.h
@@ -22,6 +22,7 @@
 #include "mlir/Conversion/SCFToControlFlow/SCFToControlFlow.h"
 #include "mlir/Dialect/GPU/IR/GPUDialect.h"
 #include "mlir/Dialect/LLVMIR/LLVMAttrs.h"
+#include "mlir/Dialect/LLVMIR/Transforms/OpenMPOffloadPrivatizationPrepare.h"
 #include "mlir/Pass/PassManager.h"
 #include "mlir/Transforms/GreedyPatternRewriteDriver.h"
 #include "mlir/Transforms/Passes.h"
diff --git a/flang/lib/Optimizer/Passes/Pipelines.cpp b/flang/lib/Optimizer/Passes/Pipelines.cpp
index ca8e820608688..6a11461cd8380 100644
--- a/flang/lib/Optimizer/Passes/Pipelines.cpp
+++ b/flang/lib/Optimizer/Passes/Pipelines.cpp
@@ -403,6 +403,12 @@ void createMLIRToLLVMPassPipeline(mlir::PassManager &pm,
 
   // Add codegen pass pipeline.
   fir::createDefaultFIRCodeGenPassPipeline(pm, config, inputFilename);
+
+  // Run a pass to prepare for translation of delayed privatization in the
+  // context of deferred target tasks.
+  addNestedPassConditionally<mlir::LLVM::LLVMFuncOp>(pm, disableFirToLlvmIr,[&]() {
+    return mlir::LLVM::createPrepareForOMPOffloadPrivatizationPass();
+  });
 }
 
 } // namespace fir
diff --git a/flang/test/Driver/tco-emit-final-mlir.fir b/flang/test/Driver/tco-emit-final-mlir.fir
index 75f8f153127af..177810cf41378 100644
--- a/flang/test/Driver/tco-emit-final-mlir.fir
+++ b/flang/test/Driver/tco-emit-final-mlir.fir
@@ -13,7 +13,7 @@
 // CHECK: llvm.return
 // CHECK-NOT: func.func
 
-func.func @_QPfoo() {
+func.func @_QPfoo() -> !fir.ref<i32> {
   %1 = fir.alloca i32
-  return
+  return %1 : !fir.ref<i32>
 }
diff --git a/flang/test/Driver/tco-test-gen.fir b/flang/test/Driver/tco-test-gen.fir
index 38d4e50ecf3aa..15483f7ee3534 100644
--- a/flang/test/Driver/tco-test-gen.fir
+++ b/flang/test/Driver/tco-test-gen.fir
@@ -42,11 +42,10 @@ func.func @_QPtest(%arg0: !fir.ref<i32> {fir.bindc_name = "num"}, %arg1: !fir.re
 // CHECK-SAME:      %[[ARG2:.*]]: !llvm.ptr {fir.bindc_name = "ub", llvm.nocapture},
 // CHECK-SAME:      %[[ARG3:.*]]: !llvm.ptr {fir.bindc_name = "step", llvm.nocapture}) {
 
+// CMPLX:           %[[VAL_3:.*]] = llvm.mlir.constant(0 : index) : i64
+// CMPLX:           %[[VAL_2:.*]] = llvm.mlir.constant(1 : index) : i64
 // CMPLX:           %[[VAL_0:.*]] = llvm.mlir.constant(1 : i64) : i64
 // CMPLX:           %[[VAL_1:.*]] = llvm.alloca %[[VAL_0]] x i32 {bindc_name = "i"} : (i64) -> !llvm.ptr
-// CMPLX:           %[[VAL_2:.*]] = llvm.mlir.constant(1 : index) : i64
-// CMPLX:           %[[VAL_3:.*]] = llvm.mlir.constant(0 : index) : i64
-// CMPLX:           %[[VAL_4:.*]] = llvm.mlir.constant(1 : i64) : i64
 
 // SIMPLE:          %[[VAL_3:.*]] = llvm.mlir.constant(0 : index) : i64
 // SIMPLE:          %[[VAL_2:.*]] = llvm.mlir.constant(1 : index) : i64
diff --git a/flang/test/Fir/alloc-32.fir b/flang/test/Fir/alloc-32.fir
index a3cbf200c24fc..f57f6ce6fcf5e 100644
--- a/flang/test/Fir/alloc-32.fir
+++ b/flang/test/Fir/alloc-32.fir
@@ -19,7 +19,7 @@ func.func @allocmem_scalar_nonchar() -> !fir.heap<i32> {
 // CHECK-LABEL: define ptr @allocmem_scalar_dynchar(
 // CHECK-SAME: i32 %[[len:.*]])
 // CHECK: %[[mul1:.*]] = sext i32 %[[len]] to i64
-// CHECK: %[[mul2:.*]] = mul i64 1, %[[mul1]]
+// CHECK: %[[mul2:.*]] = mul i64 %[[mul1]], 1
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[mul2]], 0
 // CHECK: %[[sz:.*]] = select i1 %[[cmp]], i64 %[[mul2]], i64 1
 // CHECK: %[[trunc:.*]] = trunc i64 %[[sz]] to i32
diff --git a/flang/test/Fir/alloc.fir b/flang/test/Fir/alloc.fir
index 8da8b828c18b9..0d3ce323d0d7c 100644
--- a/flang/test/Fir/alloc.fir
+++ b/flang/test/Fir/alloc.fir
@@ -86,7 +86,7 @@ func.func @alloca_scalar_dynchar_kind(%l : i32) -> !fir.ref<!fir.char<2,?>> {
 // CHECK-LABEL: define ptr @allocmem_scalar_dynchar(
 // CHECK-SAME: i32 %[[len:.*]])
 // CHECK: %[[mul1:.*]] = sext i32 %[[len]] to i64
-// CHECK: %[[mul2:.*]] = mul i64 1, %[[mul1]]
+// CHECK: %[[mul2:.*]] = mul i64 %[[mul1]], 1
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[mul2]], 0
 // CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[mul2]], i64 1
 // CHECK: call ptr @malloc(i64 %[[size]])
@@ -98,7 +98,7 @@ func.func @allocmem_scalar_dynchar(%l : i32) -> !fir.heap<!fir.char<1,?>> {
 // CHECK-LABEL: define ptr @allocmem_scalar_dynchar_kind(
 // CHECK-SAME: i32 %[[len:.*]])
 // CHECK: %[[mul1:.*]] = sext i32 %[[len]] to i64
-// CHECK: %[[mul2:.*]] = mul i64 2, %[[mul1]]
+// CHECK: %[[mul2:.*]] = mul i64 %[[mul1]], 2
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[mul2]], 0
 // CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[mul2]], i64 1
 // CHECK: call ptr @malloc(i64 %[[size]])
@@ -185,7 +185,7 @@ func.func @alloca_dynarray_of_nonchar2(%e: index) -> !fir.ref<!fir.array<?x?xi32
 
 // CHECK-LABEL: define ptr @allocmem_dynarray_of_nonchar(
 // CHECK-SAME: i64 %[[extent:.*]])
-// CHECK: %[[prod1:.*]] = mul i64 12, %[[extent]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[extent]], 12
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod1]], 0
 // CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[prod1]], i64 1
 // CHECK: call ptr @malloc(i64 %[[size]])
@@ -196,7 +196,7 @@ func.func @allocmem_dynarray_of_nonchar(%e: index) -> !fir.heap<!fir.array<3x?xi
 
 // CHECK-LABEL: define ptr @allocmem_dynarray_of_nonchar2(
 // CHECK-SAME: i64 %[[extent:.*]])
-// CHECK: %[[prod1:.*]] = mul i64 4, %[[extent]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[extent]], 4
 // CHECK: %[[prod2:.*]] = mul i64 %[[prod1]], %[[extent]]
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod2]], 0
 // CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[prod2]], i64 1
@@ -227,7 +227,7 @@ func.func @alloca_dynarray_of_char2(%e : index) -> !fir.ref<!fir.array<?x?x!fir.
 
 // CHECK-LABEL: define ptr @allocmem_dynarray_of_char(
 // CHECK-SAME: i64 %[[extent:.*]])
-// CHECK: %[[prod1:.*]] = mul i64 60, %[[extent]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[extent]], 60
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod1]], 0
 // CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[prod1]], i64 1
 // CHECK: call ptr @malloc(i64 %[[size]])
@@ -238,7 +238,7 @@ func.func @allocmem_dynarray_of_char(%e : index) -> !fir.heap<!fir.array<3x?x!fi
 
 // CHECK-LABEL: define ptr @allocmem_dynarray_of_char2(
 // CHECK-SAME: i64 %[[extent:.*]])
-// CHECK: %[[prod1:.*]] = mul i64 20, %[[extent]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[extent]], 20
 // CHECK: %[[prod2:.*]] = mul i64 %[[prod1]], %[[extent]]
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod2]], 0
 // CHECK: %[[size:.*]] = select i1 %[[cmp]], i64 %[[mul2]], i64 1
@@ -286,7 +286,7 @@ func.func @allocmem_dynarray_of_dynchar(%l: i32, %e : index) -> !fir.heap<!fir.a
 // CHECK-LABEL: define ptr @allocmem_dynarray_of_dynchar2(
 // CHECK-SAME: i32 %[[len:.*]], i64 %[[extent:.*]])
 // CHECK: %[[a:.*]] = sext i32 %[[len]] to i64
-// CHECK: %[[prod1:.*]] = mul i64 2, %[[a]]
+// CHECK: %[[prod1:.*]] = mul i64 %[[a]], 2
 // CHECK: %[[prod2:.*]] = mul i64 %[[prod1]], %[[extent]]
 // CHECK: %[[prod3:.*]] = mul i64 %[[prod2]], %[[extent]]
 // CHECK: %[[cmp:.*]] = icmp sgt i64 %[[prod3]], 0
@@ -366,12 +366,13 @@ func.func @allocmem_array_with_holes_dynchar(%arg0: index, %arg1: index) -> !fir
 // CHECK:    %[[VAL_0:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] }, i64 1
 // CHECK:    %[[VAL_3:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]], ptr, [1 x i64] }, i64 1
 // CHECK:    %[[VAL_2:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] }, i64 1
-
+func.func private @foo(%0: !fir.ref<!fir.class<none>>, %1: !fir.ref<!fir.class<!fir.array<?xnone>>>, %2: !fir.ref<!fir.box<none>>, %3: !fir.ref<!fir.box<!fir.array<?xnone>>>)
 func.func @alloca_unlimited_polymorphic_box() {
   %0 = fir.alloca !fir.class<none>
   %1 = fir.alloca !fir.class<!fir.array<?xnone>>
   %2 = fir.alloca !fir.box<none>
   %3 = fir.alloca !fir.box<!fir.array<?xnone>>
+  fir.call @foo(%0, %1, %2, %3) : (!fir.ref<!fir.class<none>>, !fir.ref<!fir.class<!fir.array<?xnone>>>, !fir.ref<!fir.box<none>>, !fir.ref<!fir.box<!fir.array<?xnone>>>) -> ()
   return
 }
 // Note: allocmem of fir.box are not possible (fir::HeapType::verify does not
diff --git a/flang/test/Fir/arrexp.fir b/flang/test/Fir/arrexp.fir
index e8ec8ac79e0c2..2eb717228d998 100644
--- a/flang/test/Fir/arrexp.fir
+++ b/flang/test/Fir/arrexp.fir
@@ -143,9 +143,9 @@ func.func @f6(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: f32) {
   %c9 = arith.constant 9 : index
   %c10 = arith.constant 10 : index
 
-  // CHECK: %[[EXT_GEP:.*]] = getelementptr {{.*}} %[[A]], i32 0, i32 7, i64 0, i32 1
+  // CHECK: %[[EXT_GEP:.*]] = getelementptr {{.*}} %[[A]], i32 0, i32 7, i32 0, i32 1
   // CHECK: %[[EXTENT:.*]] = load i64, ptr %[[EXT_GEP]]
-  // CHECK: %[[SIZE:.*]] = mul i64 4, %[[EXTENT]]
+  // CHECK: %[[SIZE:.*]] = mul i64 %[[EXTENT]], 4
   // CHECK: %[[CMP:.*]] = icmp sgt i64 %[[SIZE]], 0
   // CHECK: %[[SZ:.*]] = select i1 %[[CMP]], i64 %[[SIZE]], i64 1
   // CHECK: %[[MALLOC:.*]] = call ptr @malloc(i64 %[[SZ]])
diff --git a/flang/test/Fir/basic-program.fir b/flang/test/Fir/basic-program.fir
index c9fe53bf093a1..6bad03dded24d 100644
--- a/flang/test/Fir/basic-program.fir
+++ b/flang/test/Fir/basic-program.fir
@@ -158,4 +158,6 @@ func.func @_QQmain() {
 // PASSES-NEXT:  LowerNontemporalPass
 // PASSES-NEXT: FIRToLLVMLowering
 // PASSES-NEXT: ReconcileUnrealizedCasts
+// PASSES-NEXT: 'llvm.func' Pipeline
+// PASSES-NEXT: PrepareForOMPOffloadPrivatizationPass
 // PASSES-NEXT: LLVMIRLoweringPass
diff --git a/flang/test/Fir/box.fir b/flang/test/Fir/box.fir
index c0cf3d8375983..760fbd4792122 100644
--- a/flang/test/Fir/box.fir
+++ b/flang/test/Fir/box.fir
@@ -57,7 +57,7 @@ func.func @fa(%a : !fir.ref<!fir.array<100xf32>>) {
 // CHECK-SAME: ptr {{[^%]*}}%[[res:.*]], ptr {{[^%]*}}%[[arg0:.*]], i64 %[[arg1:.*]])
 func.func @b1(%arg0 : !fir.ref<!fir.char<1,?>>, %arg1 : index) -> !fir.box<!fir.char<1,?>> {
   // CHECK: %[[alloca:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8 }
-  // CHECK: %[[size:.*]] = mul i64 1, %[[arg1]]
+  // CHECK: %[[size:.*]] = mul i64 %[[arg1]], 1
   // CHECK: insertvalue {{.*}} undef, i64 %[[size]], 1
   // CHECK: insertvalue {{.*}} i32 20240719, 2
   // CHECK: insertvalue {{.*}} ptr %[[arg0]], 0
@@ -89,7 +89,7 @@ func.func @b2(%arg0 : !fir.ref<!fir.array<?x!fir.char<1,5>>>, %arg1 : index) ->
 func.func @b3(%arg0 : !fir.ref<!fir.array<?x!fir.char<1,?>>>, %arg1 : index, %arg2 : index) -> !fir.box<!fir.array<?x!fir.char<1,?>>> {
   %1 = fir.shape %arg2 : (index) -> !fir.shape<1>
   // CHECK: %[[alloca:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }
-  // CHECK: %[[size:.*]] = mul i64 1, %[[arg1]]
+  // CHECK: %[[size:.*]] = mul i64 %[[arg1]], 1
   // CHECK: insertvalue {{.*}} i64 %[[size]], 1
   // CHECK: insertvalue {{.*}} i32 20240719, 2
   // CHECK: insertvalue {{.*}} i64 %[[arg2]], 7, 0, 1
@@ -108,7 +108,7 @@ func.func @b4(%arg0 : !fir.ref<!fir.array<7x!fir.char<1,?>>>, %arg1 : index) ->
   %c_7 = arith.constant 7 : index
   %1 = fir.shape %c_7 : (index) -> !fir.shape<1>
   // CHECK: %[[alloca:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }
-  // CHECK:   %[[size:.*]] = mul i64 1, %[[arg1]]
+  // CHECK:   %[[size:.*]] = mul i64 %[[arg1]], 1
   // CHECK: insertvalue {{.*}} i64 %[[size]], 1
   // CHECK: insertvalue {{.*}} i32 20240719, 2
   // CHECK: insertvalue {{.*}} i64 7, 7, 0, 1
diff --git a/flang/test/Fir/boxproc.fir b/flang/test/Fir/boxproc.fir
index 97d9b38ed6f40..d4c36a4f5b213 100644
--- a/flang/test/Fir/boxproc.fir
+++ b/flang/test/Fir/boxproc.fir
@@ -82,12 +82,8 @@ func.func @_QPtest_proc_dummy_other(%arg0: !fir.boxproc<() -> ()>) {
 // CHECK:         store [1 x i8] c" ", ptr %[[VAL_18]], align 1
 // CHECK:         call void @llvm.init.trampoline(ptr %[[VAL_20]], ptr @_QFtest_proc_dummy_charPgen_message, ptr %[[VAL_2]])
 // CHECK:         %[[VAL_23:.*]] = call ptr @llvm.adjust.trampoline(ptr %[[VAL_20]])
-// CHECK:         %[[VAL_25:.*]] = insertvalue { ptr, i64 } undef, ptr %[[VAL_23]], 0
-// CHECK:         %[[VAL_26:.*]] = insertvalue { ptr, i64 } %[[VAL_25]], i64 10, 1
 // CHECK:         %[[VAL_27:.*]] = call ptr @llvm.stacksave.p0()
-// CHECK:         %[[VAL_28:.*]] = extractvalue { ptr, i64 } %[[VAL_26]], 0
-// CHECK:         %[[VAL_29:.*]] = extractvalue { ptr, i64 } %[[VAL_26]], 1
-// CHECK:         %[[VAL_30:.*]] = call { ptr, i64 } @_QPget_message(ptr %[[VAL_0]], i64 40, ptr %[[VAL_28]], i64 %[[VAL_29]])
+// CHECK:         %[[VAL_30:.*]] = call { ptr, i64 } @_QPget_message(ptr %[[VAL_0]], i64 40, ptr %[[VAL_23]], i64 10)
 // CHECK:         %[[VAL_32:.*]] = call i1 @_FortranAioOutputAscii(ptr %{{.*}}, ptr %[[VAL_0]], i64 40)
 // CHECK:         call void @llvm.stackrestore.p0(ptr %[[VAL_27]])
 
@@ -115,14 +111,10 @@ func.func @_QPtest_proc_dummy_other(%arg0: !fir.boxproc<() -> ()>) {
 // CHECK-LABEL: define { ptr, i64 } @_QPget_message(ptr
 // CHECK-SAME:                  %[[VAL_0:.*]], i64 %[[VAL_1:.*]], ptr %[[VAL_2:.*]], i64
 // CHECK-SAME:                                                 %[[VAL_3:.*]])
-// CHECK:         %[[VAL_4:.*]] = insertvalue { ptr, i64 } undef, ptr %[[VAL_2]], 0
-// CHECK:         %[[VAL_5:.*]] = insertvalue { ptr, i64 } %[[VAL_4]], i64 %[[VAL_3]], 1
-// CHECK:         %[[VAL_7:.*]] = extractvalue { ptr, i64 } %[[VAL_5]], 0
-// CHECK:         %[[VAL_8:.*]] = extractvalue { ptr, i64 } %[[VAL_5]], 1
 // CHECK:         %[[VAL_9:.*]] = call ptr @llvm.stacksave.p0()
-// CHECK:         %[[VAL_10:.*]] = alloca i8, i64 %[[VAL_8]], align 1
-// CHECK:         %[[VAL_12:.*]] = call { ptr, i64 } %[[VAL_7]](ptr %[[VAL_10]], i64 %[[VAL_8]])
-// CHECK:         %[[VAL_13:.*]] = add i64 %[[VAL_8]], 12
+// CHECK:         %[[VAL_10:.*]] = alloca i8, i64 %[[VAL_3]], align 1
+// CHECK:         %[[VAL_12:.*]] = call { ptr, i64 } %[[VAL_2]](ptr %[[VAL_10]], i64 %[[VAL_3]])
+// CHECK:         %[[VAL_13:.*]] = add i64 %[[VAL_3]], 12
 // CHECK:         %[[VAL_14:.*]] = alloca i8, i64 %[[VAL_13]], align 1
 // CHECK:         call void @llvm.memmove.p0.p0.i64(ptr %[[VAL_14]], ptr {{.*}}, i64 12, i1 false)
 // CHECK:         %[[VAL_18:.*]] = phi i64
diff --git a/flang/test/Fir/embox.fir b/flang/test/Fir/embox.fir
index 0f304cff2c79e..11f7457b6873c 100644
--- a/flang/test/Fir/embox.fir
+++ b/flang/test/Fir/embox.fir
@@ -11,7 +11,7 @@ func.func @_QPtest_callee(%arg0: !fir.box<!fir.array<?xi32>>) {
 func.func @_QPtest_slice() {
 // CHECK:  %[[a1:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8
 // CHECK:  %[[a2:.*]] = alloca [20 x i32], i64 1, align 4
-// CHECK:  %[[a3:.*]] = getelementptr [20 x i32], ptr %[[a2]], i64 0, i64 0
+// CHECK:  %[[a3:.*]] = getelementptr [20 x i32], ptr %[[a2]], i32 0, i64 0
 // CHECK:  %[[a4:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }
 // CHECK:  { ptr undef, i64 4, i32 20240719, i8 1, i8 9, i8 0, i8 0, [1 x [3 x i64]]
 // CHECK: [i64 1, i64 5, i64 8]] }, ptr %[[a3]], 0
@@ -38,7 +38,7 @@ func.func @_QPtest_dt_callee(%arg0: !fir.box<!fir.array<?xi32>>) {
 func.func @_QPtest_dt_slice() {
 // CHECK:  %[[a1:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8
 // CHECK:  %[[a3:.*]] = alloca [20 x %_QFtest_dt_sliceTt], i64 1, align 8
-// CHECK:  %[[a4:.*]] = getelementptr [20 x %_QFtest_dt_sliceTt], ptr %[[a3]], i64 0, i64 0, i32 0
+// CHECK:  %[[a4:.*]] = getelementptr [20 x %_QFtest_dt_sliceTt], ptr %[[a3]], i32 0, i64 0, i32 0
 // CHECK: %[[a5:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }
 // CHECK-SAME: { ptr undef, i64 4, i32 20240719, i8 1, i8 9, i8 0, i8 0, [1 x [3 x i64]]
 // CHECK-SAME: [i64 1, i64 5, i64 16
@@ -73,7 +73,7 @@ func.func @emboxSubstring(%arg0: !fir.ref<!fir.array<2x3x!fir.char<1,4>>>) {
   %0 = fir.shape %c2, %c3 : (index, index) -> !fir.shape<2>
   %1 = fir.slice %c1, %c2, %c1, %c1, %c3, %c1 substr %c1_i64, %c2_i64 : (index, index, index, index, index, index, i64, i64) -> !fir.slice<2>
   %2 = fir.embox %arg0(%0) [%1] : (!fir.ref<!fir.array<2x3x!fir.char<1,4>>>, !fir.shape<2>, !fir.slice<2>) -> !fir.box<!fir.array<?x?x!fir.char<1,?>>>
-  // CHECK: %[[addr:.*]] = getelementptr [3 x [2 x [4 x i8]]], ptr %[[arg0]], i64 0, i64 0, i64 0, i64 1
+  // CHECK: %[[addr:.*]] = getelementptr [3 x [2 x [4 x i8]]], ptr %[[arg0]], i32 0, i64 0, i64 0, i32 1
   // CHECK: insertvalue {[[descriptorType:.*]]} { ptr undef, i64 2, i32 20240719, i8 2, i8 40, i8 0, i8 0
   // CHECK-SAME: [2 x [3 x i64]] [{{\[}}3 x i64] [i64 1, i64 2, i64 4], [3 x i64] [i64 1, i64 3, i64 8]] }
   // CHECK-SAME: ptr %[[addr]], 0
diff --git a/flang/test/Fir/omp-reduction-embox-codegen.fir b/flang/test/Fir/omp-reduction-embox-codegen.fir
index 1645e1a407ad4..e517b1352ff5c 100644
--- a/flang/test/Fir/omp-reduction-embox-codegen.fir
+++ b/flang/test/Fir/omp-reduction-embox-codegen.fir
@@ -23,14 +23,14 @@ omp.declare_reduction @test_reduction : !fir.ref<!fir.box<i32>> init {
   omp.yield(%0 : !fir.ref<!fir.box<i32>>)
 }
 
-func.func @_QQmain() attributes {fir.bindc_name = "reduce"} {
+func.func @_QQmain()  -> !fir.ref<!fir.box<i32>> attributes {fir.bindc_name = "reduce"} {
   %4 = fir.alloca !fir.box<i32>
   omp.parallel reduction(byref @test_reduction %4 -> %arg0 : !fir.ref<!fir.box<i32>>) {
     omp.terminator
   }
-  return
+  return %4: !fir.ref<!fir.box<i32>>
 }
 
 // basically we are testing that there isn't a crash
-// CHECK-LABEL: define void @_QQmain
+// CHECK-LABEL: define ptr @_QQmain
 // CHECK-NEXT:    alloca { ptr, i64, i32, i8, i8, i8, i8 }, i64 1, align 8
diff --git a/flang/test/Fir/optional.fir b/flang/test/Fir/optional.fir
index bded8b5332a30..66ff69f083467 100644
--- a/flang/test/Fir/optional.fir
+++ b/flang/test/Fir/optional.fir
@@ -37,8 +37,7 @@ func.func @bar2() -> i1 {
 
 // CHECK-LABEL: @foo3
 func.func @foo3(%arg0: !fir.boxchar<1>) -> i1 {
-  // CHECK: %[[extract:.*]] = extractvalue { ptr, i64 } %{{.*}}, 0
-  // CHECK: %[[ptr:.*]] = ptrtoint ptr %[[extract]] to i64
+  // CHECK: %[[ptr:.*]] = ptrtoint ptr %0 to i64
   // CHECK: icmp ne i64 %[[ptr]], 0
   %0 = fir.is_present %arg0 : (!fir.boxchar<1>) -> i1
   return %0 : i1
diff --git a/flang/test/Fir/pdt.fir b/flang/test/Fir/pdt.fir
index a200cd7e7cc03..411927aae6bdf 100644
--- a/flang/test/Fir/pdt.fir
+++ b/flang/test/Fir/pdt.fir
@@ -96,13 +96,13 @@ func.func @_QTt1P.f2.offset(%0 : i32, %1 : i32) -> i32 {
 
 func.func private @bar(!fir.ref<!fir.char<1,?>>)
 
-// CHECK-LABEL: define void @_QPfoo(i32 %0, i32 %1)
-func.func @_QPfoo(%arg0 : i32, %arg1 : i32) {
+// CHECK-LABEL: define ptr @_QPfoo(i32 %0, i32 %1)
+func.func @_QPfoo(%arg0 : i32, %arg1 : i32) -> !fir.ref<!fir.type<_QTt1>> {
   // CHECK: %[[size:.*]] = call i64 @_QTt1P.mem.size(i32 %0, i32 %1)
   // CHECK: %[[alloc:.*]] = alloca i8, i64 %[[size]]
   %0 = fir.alloca !fir.type<_QTt1(p1:i32,p2:i32){f1:!fir.char<1,?>,f2:!fir.char<1,?>}>(%arg0, %arg1 : i32, i32)
   //%2 = fir.coordinate_of %0, f2 : (!fir.ref<!fir.type<_QTt1>>) -> !fir.ref<!fir.char<1,?>>
   %2 = fir.zero_bits !fir.ref<!fir.char<1,?>>
   fir.call @bar(%2) : (!fir.ref<!fir.char<1,?>>) -> ()
-  return
+  return %0 : !fir.ref<!fir.type<_QTt1>>
 }
diff --git a/flang/test/Fir/rebox.fir b/flang/test/Fir/rebox.fir
index 0c9f6d9bb94ad..d858adfb7c45d 100644
--- a/flang/test/Fir/rebox.fir
+++ b/flang/test/Fir/rebox.fir
@@ -36,7 +36,7 @@ func.func @test_rebox_1(%arg0: !fir.box<!fir.array<?x?xf32>>) {
   // CHECK: %[[VOIDBASE0:.*]] = getelementptr i8, ptr %[[INBASE]], i64 %[[OFFSET_0]]
   // CHECK: %[[OFFSET_1:.*]] = mul i64 2, %[[INSTRIDE_1]]
   // CHECK: %[[VOIDBASE1:.*]] = getelementptr i8, ptr %[[VOIDBASE0]], i64 %[[OFFSET_1]]
-  // CHECK: %[[OUTSTRIDE0:.*]] = mul i64 3, %[[INSTRIDE_1]]
+  // CHECK: %[[OUTSTRIDE0:.*]] = mul i64 %[[INSTRIDE_1]], 3
   // CHECK: %[[OUTBOX1:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } %{{.*}}, i64 %[[OUTSTRIDE0]], 7, 0, 2
   // CHECK: %[[OUTBOX2:.*]] = insertvalue { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } %[[OUTBOX1]], ptr %[[VOIDBASE1]], 0
   // CHECK: store { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } %[[OUTBOX2]], ptr %[[OUTBOX_ALLOC]], align 8
@@ -63,7 +63,7 @@ func.func @test_rebox_2(%arg0: !fir.box<!fir.array<?x?x!fir.char<1,?>>>) {
   // CHECK: %[[OUTBOX:.*]] = alloca { ptr, i64, i32, i8, i8, i8, i8, [2 x [3 x i64]] }
   // CHECK: %[[LEN_GEP:.*]] = getelementptr { ptr, i64, i32, i8, i8, i8, i8, [2 x [3 x i64]] }, ptr %[[INBOX]], i32 0, i32 1
   // CHECK: %[[LEN:...
[truncated]

github-actions · 2025-08-26T03:24:56Z

✅ With the latest revision this PR passed the C/C++ code formatter.

tblah

Why did you decide to do this in a pass instead of handling it in MLIR -> LLVM conversion as is done for omp task?

bhandarkar-pranav · 2025-08-26T20:34:48Z

Why did you decide to do this in a pass instead of handling it in MLIR -> LLVM conversion as is done for omp task?

I did anticipate this question especially because MLIR -> LLVM translation is where I had first started out with the intention of extending your work on omp.task to omp.target. I am sorry, in hindsight now, I should have included the explanation in my commit message itself.

A couple of reasons make it too late to do this during MLIR - LLVMIR translation. Too late as in not impossible, but arguably harder to get correct and maintain thereafter. Essentially, what we need to do is

Copy the privatized variable from the stack to the heap.
Fix up any captures of the address of the privatized variable that that are used by the omp.target. Typically, this would be omp.map.info operations

At the time of MLIR -> LLVM conversion the address of the private variable is "captured" into some data structures in memory such as those that process map operations (MapInfoData). MapInfoData is then used to codegen an array of pointers to be offloaded (offload_baseptrs and offload_ptrs).

Now, to allocate heap memory for the private variable, we'd have two options

Create the allocation after omp.map.info operations are processed to create MapInfoData datastructures but before OMPIRBuilder codegens offload_baseptrs and offload_ptrs. This would involve going back into the MapInfoData structures and updating the pointers to private variables with the heap-allocated addresses.
Create the allocation after the array of offloaded pointers have been created by OMPIRBuilder. In this case, we'd have to keep track of and go back and create LLVM IR (.ll) to update some index of offload_baseptrs with the heap allocated one. For instance, this is what it'd look like if the 2nd index out of 4 of offload_baseptrs was the address of a private variable

%priv_var = alloca ...
%0 = getlementptrs [4 x ptr ] offload_base_ptrs, 0, 2
store %priv_var, %0
...
...
%priv_var_heap = call  malloc (size_of_priv_var)
%1 = getlementptrs [4 x ptr ] offload_base_ptrs, 0, 2
store %priv_var_heap, %1

This requires addl bookkeeping and coordination between OpenMPToIRTranslation and OMPIRBuilder to record that index 2 of offload_baseptrs needs to be updated just before the offloading call is made. I feel 1 is better than 2 because the data structures to be updated are not in LLVM IR, but simply in memory. But, if we instead update the original omp.map.info operations themselves with heap memory (ie do the allocation before MLIR -> LLVMIR) then the rest of process moves smoothly without us having to go back and update anything. We achieve a clean separation of concern this way.

…get-tasks This patch adds support for translation of the private clause on deferred target tasks - that is `omp.target` operations with the `nowait` clause. An offloading call for a deferred target-task is not blocking - the offloading host task continues it execution after issuing the offloading call. Therefore, the key problem we need to solve is to ensure that the data needed for private variables to be initialized in the target task persists even after the host task has completed. We do this in a new pass called PrepareForOMPOffloadPrivatizationPass. For a privatized variable that needs its host counterpart for initialization (such as the shape of the data from the descriptor when an allocatable is privatized or the value of the data when an allocatable is firstprivatized), - the pass allocates memory on the heap. - it then initializes this memory by copying the contents of host variable to the newly allocated location on the heap. - Then, the pass updates all the `omp.map.info` operations that pointed to the host variable to now point to the one located in the heap. The pass uses a rewrite pattern applied using the greedy pattern matcher, which in turn does some constant folding and DCE. Due to this a number of lit tests had to be updated. In GEPs constant get folded into indices and truncated to i32 types. In some tests sequence of insertvalue and extractvalue instructions get cancelled out. So, these needed to be updated too.

…reedy pattern matcher

tblah · 2025-08-27T14:13:40Z

Ahh I see what you mean. This is different because as well as being (first)private, these variables may also be mapped, which adds another layer of complexity. I haven't followed much about mapping so I will take your word for it and leave it for experts in offloading to give their opinions.

mlir/lib/Dialect/LLVMIR/Transforms/OpenMPOffloadPrivatizationPrepare.cpp

tblah · 2025-08-27T14:28:30Z

mlir/lib/Dialect/LLVMIR/Transforms/OpenMPOffloadPrivatizationPrepare.cpp

+
+      // Allocate heap memory that corresponds to the type of memory
+      // pointed to by varPtr
+      // TODO: For boxchars this likely wont be a pointer.


Yes you can see in the code for tasks that boxchars are a hack because you can't really have a !fir.ref<!fir.boxchar<>> in the FIR type system. This is handled for tasks so you can see what I did there.

Thank you, I'll take a look.

tblah · 2025-08-27T14:31:44Z

mlir/lib/Dialect/LLVMIR/Transforms/OpenMPOffloadPrivatizationPrepare.cpp

+      // Copy the value of the local variable into the heap-allocated location.
+      mlir::Location loc = chainOfOps.front()->getLoc();
+      mlir::Type varType = getElemType(varPtr);
+      auto loadVal = rewriter.create<LLVM::LoadOp>(loc, varType, varPtr);


What about more complex types e.g. arrays, derived types?

For firstprivate you can use the copy region in the privatizer. For plain private you just need to use an init region to initialise non-trivial types but don't need to copy. This initialisation and copying must happen synchronously.

You are right that there is a problem here. The problem is that derived types could have any number of pointers and therefore deep copies will be needed. Are you suggesting that I use the copy region of the privatizer by cloning it to get the deep copy that i need? Before that though, I'd have to allocate memory for each pointer inside the derived type. I was hoping to tackle derived types in a subsequent PR, which I should have made clear in this PR.

If you intend to handle these in a later PR please catch this and create a useful error message (similar to the not yet implemented messages generated from OpenMP MLIR dialect to LLVM IR conversion.

Yes you can do initial allocation/initialization for the derived type by inlining the init region from the privatizer. If it is first private then you can inline the copy region to get a type appropriate copy. So far as I know init+copy should be sufficient without any extra allocation.

mlir/lib/Dialect/LLVMIR/Transforms/OpenMPOffloadPrivatizationPrepare.cpp

mlir/test/Dialect/LLVMIR/omp-offload-privatization-prepare.mlir

mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td

mlir/lib/Dialect/LLVMIR/Transforms/OpenMPOffloadPrivatizationPrepare.cpp

mlir/lib/Tools/mlir-opt/MlirOptMain.cpp

mlir/include/mlir/Dialect/LLVMIR/Transforms/OpenMPOffloadPrivatizationPrepare.h

mlir/lib/Dialect/LLVMIR/Transforms/OpenMPOffloadPrivatizationPrepare.cpp

mlir/include/mlir/Dialect/LLVMIR/Transforms/Passes.td

…ted version

…r after change to using the init and copy regions

bhandarkar-pranav · 2025-09-18T19:57:13Z

@tblah - I have made some updates to this pass based on your review. There are still a few things remaining so, if you want to wait before you review again, that''ll be perfectly understandable.
Done in this update

Used the init and copy regions of the privatizer to intialize the heap-based and copy the original variable into it.
Fixed the pass for boxchars (and other variables, if any, that may be privatized by value)
Other minor review comments.

Remaining items

Test with derived types
Move the pass to the omp dialect rather than the llvmir dialect.

…rom the mapInfoOp

…clude in Pipelines.cpp instead

…dialect to Transforms in OpenMP Dialect

bhandarkar-pranav · 2025-09-29T19:42:34Z

@tblah - I have made some updates to this pass based on your review. There are still a few things remaining so, if you want to wait before you review again, that''ll be perfectly understandable. Done in this update

Used the init and copy regions of the privatizer to intialize the heap-based and copy the original variable into it.

Fixed the pass for boxchars (and other variables, if any, that may be privatized by value)

Other minor review comments.

Remaining items

Test with derived types

Move the pass to the omp dialect rather than the llvmir dialect.

@tblah - I have addressed the issues above. Could you please give this PR a look over once again.

tblah

Thank you for the updates

mlir/lib/Dialect/OpenMP/Transforms/OpenMPOffloadPrivatizationPrepare.cpp

…t for dealloc

mlir/lib/Dialect/OpenMP/Transforms/OpenMPOffloadPrivatizationPrepare.cpp

mlir/include/mlir/Dialect/OpenMP/Transforms/OpenMPOffloadPrivatizationPrepare.h

mlir/include/mlir/Dialect/OpenMP/Transforms/Passes.h

… meinersbur - Use the first heap-allocated private as fakeDependVar as suggested by tblah.

tblah

Just one comment that should be easy to address.

flang/lib/Optimizer/Passes/Pipelines.cpp

tblah · 2025-10-16T07:31:41Z

mlir/lib/Dialect/OpenMP/Transforms/OpenMPOffloadPrivatizationPrepare.cpp

+
+  LLVM::LLVMFuncOp getMalloc(ModuleOp mod, IRRewriter &rewriter) const {
+    llvm::FailureOr<LLVM::LLVMFuncOp> mallocCall =
+        LLVM::lookupOrCreateMallocFn(rewriter, mod, rewriter.getI64Type());


This call can mutate the module. It isn't safe to call this in a pass that runs on functions because then the functions can be processed in parallel and try to mutate the module in parallel, which tends to go badly. I think the pass will have to be made into a module pass.

The documentation for MLIR's concurrency model in passes is here https://mlir.llvm.org/docs/PassManagement/#operation-pass. In particular:

Must not modify the state of operations other than the operations that are nested under the current operation. This includes adding, modifying or removing other operations from an ancestor/parent block.

I guess technically we also need to be a module pass because we read from the privatizer operation, although in practice nothing is going to be modifying it at this point in the pipeline.

ergawy

Sorry for the delay here Pranav. Went through the whole PR. Awesome work, just a number of not so large comments.

mlir/include/mlir/Dialect/OpenMP/Transforms/Passes.td

ergawy · 2025-10-16T07:39:35Z

mlir/include/mlir/Dialect/OpenMP/Transforms/OpenMPOffloadPrivatizationPrepare.h

@@ -0,0 +1,22 @@
+//===- OpenMPOffloadPrivatizationPrepare.h ----------------------*- C++ -*-===//


I think this file should be deleted and then include Passes.h instead.

mlir/include/mlir/Dialect/OpenMP/Transforms/Passes.h

ergawy · 2025-10-16T07:49:01Z

mlir/include/mlir/Dialect/OpenMP/Transforms/Passes.td

+
+include "mlir/Pass/PassBase.td"
+
+def PrepareForOMPOffloadPrivatizationPass : Pass<"omp-offload-privatization-prepare", "::mlir::LLVM::LLVMFuncOp"> {


Should this be a module pass instead since the pass modifies the parent module?

Will change.

mlir/lib/Dialect/LLVMIR/Transforms/CMakeLists.txt

ergawy · 2025-10-16T09:04:14Z

mlir/lib/Dialect/OpenMP/Transforms/OpenMPOffloadPrivatizationPrepare.cpp

+            omp::MapInfoOp memberMap =
+                cast<omp::MapInfoOp>(member.getDefiningOp());
+            if (memberMap.getVarPtrPtr() &&
+                usesVarPtr(memberMap.getVarPtrPtr().getDefiningOp()))


Is it possible that VarPtrPtr be a value defined by a BlockArgument?

mlir/lib/Dialect/OpenMP/Transforms/OpenMPOffloadPrivatizationPrepare.cpp

ergawy · 2025-10-16T09:24:32Z

mlir/lib/Dialect/OpenMP/Transforms/OpenMPOffloadPrivatizationPrepare.cpp

+      rewriter.setInsertionPoint(targetOp);
+      Operation *newOp = rewriter.clone(*targetOp.getOperation());
+      omp::TargetOp newTargetOp = cast<omp::TargetOp>(newOp);
+      rewriter.modifyOpInPlace(newTargetOp, [&]() {
+        newTargetOp.getPrivateVarsMutable().assign(newPrivVars);
+      });
+      rewriter.replaceOp(targetOp, newTargetOp);


I think this is lighter weight than cloning and replacing the original op.

Suggested change

rewriter.setInsertionPoint(targetOp);

Operation *newOp = rewriter.clone(*targetOp.getOperation());

omp::TargetOp newTargetOp = cast<omp::TargetOp>(newOp);

rewriter.modifyOpInPlace(newTargetOp, [&]() {

newTargetOp.getPrivateVarsMutable().assign(newPrivVars);

});

rewriter.replaceOp(targetOp, newTargetOp);

targetOp.getPrivateVarsMutable().clear();

targetOp.getPrivateVarsMutable().assign(newPrivVars);

ergawy · 2025-10-16T09:30:29Z

mlir/lib/Dialect/OpenMP/Transforms/OpenMPOffloadPrivatizationPrepare.cpp

+
+  // Generate code to get the size of data being mapped from the bounds
+  // of mapInfoOp
+  Value getSizeInBytes(omp::MapInfoOp mapInfoOp, ModuleOp mod,


This function does not seem to be used anywhere. Should it be used by allocateHeapMem instead of the above overload?

Thank you for catching. It was used earlier but I found another way of getting this info.

ergawy · 2025-10-16T09:34:30Z

mlir/test/Dialect/OpenMP/omp-offload-privatization-prepare.mlir

@@ -0,0 +1,351 @@
+// RUN: mlir-opt --mlir-disable-threading -omp-offload-privatization-prepare --split-input-file %s | FileCheck %s


Can we split this test into multiple ones based on the privatized type? Just to make debugging and reading through the test(s) easier.

bhandarkar-pranav · 2025-10-16T14:43:01Z

Sorry for the delay here Pranav. Went through the whole PR. Awesome work, just a number of not so large comments.

No problem at all, thank you for the review. I'll take care of your comments today.

bhandarkar-pranav requested review from Meinersbur, TIFitis, agozillon, ergawy, jsjodin, skatrak and tblah August 26, 2025 03:21

llvmbot added mlir:core MLIR Core Infrastructure mlir:llvm flang:driver mlir flang Flang issues not falling into any other category mlir:openmp flang:fir-hlfir flang:openmp labels Aug 26, 2025

tblah reviewed Aug 26, 2025

View reviewed changes

bhandarkar-pranav added 3 commits August 26, 2025 16:51

Add some comments and clean up some codoe

f72152a

Fix CHECK stmts in test to account for constant folding done by the g…

c859bbc

…reedy pattern matcher

bhandarkar-pranav force-pushed the flang/delayed_priv_def_tgt_tasks_translation branch from ff8afbd to c859bbc Compare August 26, 2025 22:16

Fix clang-format issues

697cc4f

tblah reviewed Aug 27, 2025

View reviewed changes

Meinersbur reviewed Aug 28, 2025

View reviewed changes

Checkpoint commit, working with operaiton->walk

bc107cd

joker-eph reviewed Sep 12, 2025

View reviewed changes

mlir/include/mlir/Dialect/LLVMIR/Transforms/Passes.td Outdated Show resolved Hide resolved

bhandarkar-pranav added 5 commits September 18, 2025 12:48

use the init region to initialize the heap allocated private variable

7032cae

use the copy region to copy the private variable into its heap alloca…

269c575

…ted version

Fix for boxchars is working

3109e4f

Adjust mlir/test/Dialect/LLVMIR/omp-offload-privatization-prepare.mli…

d5b9c27

…r after change to using the init and copy regions

Address minor review comments

8bd4359

bhandarkar-pranav added 9 commits September 24, 2025 14:27

Handle the case where varPtr is a blockargument. - Take the varType f…

a393ff1

…rom the mapInfoOp

clean up the pass a little bit

28ec66a

Do not include OpenMPOffloadPrivatizationPrepare.h in Pipelines.h. In…

d95799d

…clude in Pipelines.cpp instead

Move PrepareForOMPOffloadPrivatizationPass from Transforms in LLVMIR …

0829719

…dialect to Transforms in OpenMP Dialect

Add a lit test for boxchars

01d1f69

Make createFuncForRegionAndCallIt take an arrayref as argumen

d595572

clean up

bf949fa

make clang-format happy

72a769f

Add some more comments and fix a typo

061669f

tblah reviewed Oct 2, 2025

View reviewed changes

mlir/lib/Dialect/OpenMP/Transforms/OpenMPOffloadPrivatizationPrepare.cpp Show resolved Hide resolved

bhandarkar-pranav added 4 commits October 5, 2025 00:21

Checkpoint commit - dealloc working - need to fix lit testcase to tes…

e036923

…t for dealloc

fix lit testcase for dealloc region generation

5b78bab

fix clang-format problems

b34b0e0

Update test for differences between downstream and upstream

4c5d8d8

ergawy reviewed Oct 9, 2025

View reviewed changes

mlir/lib/Dialect/OpenMP/Transforms/OpenMPOffloadPrivatizationPrepare.cpp Show resolved Hide resolved

tblah reviewed Oct 15, 2025

View reviewed changes

mlir/lib/Dialect/OpenMP/Transforms/OpenMPOffloadPrivatizationPrepare.cpp Outdated Show resolved Hide resolved

mlir/lib/Dialect/OpenMP/Transforms/OpenMPOffloadPrivatizationPrepare.cpp Show resolved Hide resolved

Meinersbur reviewed Oct 15, 2025

View reviewed changes

mlir/include/mlir/Dialect/OpenMP/Transforms/OpenMPOffloadPrivatizationPrepare.h Outdated Show resolved Hide resolved

mlir/include/mlir/Dialect/OpenMP/Transforms/Passes.h Outdated Show resolved Hide resolved

bhandarkar-pranav added 2 commits October 15, 2025 15:11

Make changes requested by reviewers - Formatting changes requested by…

6288da9

… meinersbur - Use the first heap-allocated private as fakeDependVar as suggested by tblah.

Update test (again) for differences between downstream and upstream

0a4ef58

tblah reviewed Oct 16, 2025

View reviewed changes

ergawy reviewed Oct 16, 2025

View reviewed changes

		@@ -0,0 +1,22 @@
		//===- OpenMPOffloadPrivatizationPrepare.h ----------------------- C++ --===//


		include "mlir/Pass/PassBase.td"

		def PrepareForOMPOffloadPrivatizationPass : Pass<"omp-offload-privatization-prepare", "::mlir::LLVM::LLVMFuncOp"> {

		@@ -0,0 +1,351 @@
		// RUN: mlir-opt --mlir-disable-threading -omp-offload-privatization-prepare --split-input-file %s \| FileCheck %s

[Flang][mlir] - Translation of delayed privatization for deferred target-tasks #155348

Are you sure you want to change the base?

[Flang][mlir] - Translation of delayed privatization for deferred target-tasks #155348

Uh oh!

Conversation

bhandarkar-pranav commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Aug 26, 2025

Uh oh!

llvmbot commented Aug 26, 2025

Uh oh!

github-actions bot commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tblah left a comment

Choose a reason for hiding this comment

Uh oh!

bhandarkar-pranav commented Aug 26, 2025

Uh oh!

tblah commented Aug 27, 2025

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bhandarkar-pranav commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bhandarkar-pranav commented Sep 29, 2025

Uh oh!

tblah left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tblah left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ergawy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

bhandarkar-pranav commented Aug 26, 2025 •

edited

Loading

llvmbot commented Aug 26, 2025 •

edited

Loading

github-actions bot commented Aug 26, 2025 •

edited

Loading

bhandarkar-pranav commented Sep 18, 2025 •

edited

Loading